








June, 1953 Vol. 17, No. 3 


CONTENTS 
Process and Reactive Schizophrenia: Robert E. Kantor, Julius M. Wallner, 
eudC. i. Waer -<« «= «= es 2&= es e we es we we eo cs 


Prognostic Criteria in the Case Histories of Hospitalized Mental Patients: 
G. R. Pascal, C. H. Swensen, Dorothy A. Feldman, Mary E. Cole, and Jean Bayard 


An Evaluation of Dymond’s Test of Insight and Empathy: Henry Clay Lindgren 
and Jacqueline Robimsomn + - - -© = = © © = © © © © © w@© © = 


The Present Status of Research on Nondirective Play Therapy: Dell Lebo - - 


The Relative Effectiveness of Larger Units Used in Interview Analysis: 
Be ee ee eee ee ee ee ae 


On the Supposed Behavioral Correlates of an “Eye” Content Response on the 
Rorschach: Michael Wertheimer - - - - - - - = = = = + = «= = 


The Levy Movement Test: Suggestions for Scoring and Relationship to Rorschach 
Movement Responses: Robert M. Allen, Charles D. Ray, and Robert C. Poole 


Prognosis in Paranoid Schizophrenia on the Basis of the Rorschach: David Grauer - 


A Comparison of Scores on the Index of Adjustment and Values with Behavior in 
Level-of-Aspiration Tasks: Robert E. Bills - - - - - - - - += = - 


Directionality of Lines in the Bender-Gestalt Test: Roland M. Peek - - - - 
Test-Retest Stability of MMPI Scales for a Psychiatric Population: Albert Rosen 
Perceptual Learning and Age: Bernard Hanes - - - - - - = = = = = 


Wechsler-Bellevue Split-Half Subtest Reliabilities: Differences in Age and Mental 
Status: Jack Botwinick - - - - - - - - = + +--+ = = ‘ 


Changes in Wechsler-Bellevue Test Performance following Prefrontal Lobotomy: 
Earl D. Markwell, Jr., William M. Wheeler, and Helen Kitzinger - - - 


i aa es oe Ce. se « « « « © « © » « « 








ournal of Consulting Psychology 
ol. 17, No. 3, 1953 


Process and Reactive Schizophrenia’ 


Robert E. Kantor 
Palo Alto, California 


Julius M. Wallner 


Veterans Administration Hospital, Palo Alto 


and C. L. Winder 


Stanford University 


The process-reactive hypothesis regarding 
schizophrenia may be stated as follows: With- 
in the category schizophrenia two subgroups 
can be differentiated, these being the process 
or true or chronic, and the reactive or benign 
or acute. The hypothesis has usually been 
stated in the dichotomous form, as in this 
study, but may well develop into a dimensional 
concept as it is clarified empirically. 


This hypothesis has appeared explicitly or 
implicitly in psychiatric and psychological lit- 
erature since Bleuler [4, 5, 6] formulated the 
concept of schizophrenia in revision of Kraepe- 
lin’s [14, 15] concept of dementia praecox. 
Bleuler recognized that there were some pa- 
tients who recovered who were classifiable as 
dementia praecox according to Kraepelin’s cri- 
teria, and, therefore, that further differentia- 
tion than that afforded by those criteria was 
necessary if maximum predictability was to be 
obtained. As Meduna and McCullough [22] 
have clearly argued, Bleuler’s distinction has 
not been followed, and so the prediction of 
recovery among those diagnosed schizophrenic 
or dementia praecox has not been as effective 
as it might have been. 


Kretschmer [16] and later Mauz [21] held 
one version of the process-reactive view, be- 
lieving that true schizophrenia occurs pre- 
ponderantly in those of asthenic or dysplastic 
body type. These authors thought that reactive 
schizophrenia occurs in those whose constitu- 
tion does not render them prone to develop 
true schizophrenia. 


1From the Veterans Administration Hospital, 
Palo Alto, California. 


Lewis [20] in his 1936 review of dementia- 
praecox researches arrived at the conclusion 
that investigation of the process-reactive hy- 
pothesis was a major indicated direction for 
further research. 

Langfeldt [17] has published evidence in 
favor of the process-reactive view, finding that 
previously identified atypical cases of schizo- 
phrenia improved much more than typical 
cases. Langfeldt, in later papers [18, 19], has 
presented evidence that indefinite schizophren- 
ia, with a favorable prognosis, can be differ- 
entiated from definite schizophrenia. 

Milici [24] and Paskind and Brown [25] 
report cases diagnosed as schizophrenia in 
which the precipitating environmental stress 
or shock was known and where recovery or 
remission was relatively early and complete. 
Reports of many such cases were frequent dur- 
ing and after World War II and will not be 
cited. The inference based on these studies has 
usually been that there is a schizophreniform 
psychosis which develops suddenly and in re- 
sponse to relevant stress and which does not 
follow the classical course, and that many cases 
of so-called typical schizophrenia are so classi- 
fied mistakenly. 

Extensive studies of the childhood histories 
of schizophrenics with a view to implications 
for prognosis have been reported by Wittman 
[26, 27] and by Wittman and Steinberg [28, 
29]. These studies carry on in the tradition 
of Meyer [23], as do studies of Hoch [8] and 
Kasanin and Veo [12]. The view that the 
shut-in personality and onset at early ages are 
characteristic of the process group is supported 
by the results of these studies. Similarly, Hunt 


157 








158 


and Appel [9] distinguished a group of pa- 
tients whom they labeled schizoaffective and 
among whom poorer environmental adjust- 
ment prior to illness was associated with poor 
prognosis. 

Among investigators who have found that 
prognosis in schizophrenia is related to the 
nature of the onset, Kant [10, 11] is notable. 
Unrecovered cases were characterized by grad- 
ual onset or a unique precipitating cause, 
whereas recovered cases had developed in the 
presence of pertinent stress. Kant has also pre- 
sented evidence that Bleuler’s criterion of a 
clear sensorium differentiates the nonrecovering 
group from those with better prognoses. This 
conclusion is supported also by evidence from 


R. E. Kantor, J. M. Wallner, and C. L. Winder 


Langfeldt [17] and Hunt and Appel [9]. 

A somewhat different type of evidence is rep- 
resented by a survey of the literature on hos- 
pital remission rates reported by Blair [3]. 
He found that about 40 per cent of diagnosed 
schizophrenics seem to remit spontaneously, 
while about 60 per cent of such cases tend to 
remain unimproved or deteriorate. It can be 
argued that such statistics support the process- 
reactive hypothesis. 

References to additional studies pertinent to 
the process-reactive view will be found in 
Bellak’s [1] recent summary volume in which 
the general conclusions are in support of this 
general point of view. Ebaugh and Benjamin 
[7] had previously identified four groups of 


Table 1 


Items Defining Frame of Reference for 


Case History Judgments. 








Process Schizophrenia 


Reactive Schizophrenia 





Birth to the fifth year 


a. Early psychological trauma 
b. Physical illness—severe or long 
c. Odd member of family 


a. Good psychological history 
b. Good physical health 
c. Normal member of family 


Fifth year to adolescence 


a. Difficulties at school 

b. Family troubles paralleled with sudden 
changes in patient’s behavior 

c. Introverted behavior trends and interests 

d. History of breakdown of social, physical, 
mental functioning 

é. Pathological siblings 

f. Overprotective or rejecting mother. 
“Momism” 

g. Rejecting father 


Adolescence to 


a. Lack of heterosexuality 
b. Insidious, gradual onset of psychosis without 
pertinent stress 


a. Well adjusted at school 

b. Domestic troubles unaccompanied by behavior 
disruptions. Patient “had what it took.” 

c. Extroverted behavior trends and interests 

d. History of adequate social, physical, mental 
functioning 

e. Normal siblings 

f. Normally protective, accepting mother 


g. Accepting father 


adulthood 
a. Heterosexual behavior 
b. Sudden onset of psychosis; stress present and 


pertinent. Later onset 


c. Physical aggression c. Verbal aggression 
d. Poor response to treatment d. Good response to treatment 
e. Lengthy stay in hospital e. Short course in hospital 
Adulthood 
a. Massive paranoia a. Minor paranoid trends 
b. Little capacity for alcohol b. Much capacity for alcohol 
c. No manic-depressive component c. Presence of manic-depressive component 
d. Failure under adversity d. Success despite adversity 
e. Discrepancy between ability and achievement ¢. Harmony between ability and achievement 
f. Awareness of change in self f. No sensation of change 
g. Somatic delusions g. Absence of somatic delusions 
h. Clash between culture and environment h. Harmony between culture and environment 
i. Loss of decency (nudity, public masturbation, i. Retention of decency 


etc.) 





Process and Reactive Schizophrenia 


schizophrenics as defined by the course of the 
disturbance. Just how many types of courses 
there are in schizophrenia is thus left in doubt. 
The present paper uses a dichotomy for con- 
venience. 

It is apparent from the literature available, 
that there are some patients who present a 
symptom complex very similar to the tradition- 
al schizophrenia syndrome, but who to some 
significant degree have a more favorable prog- 
nosis. If one were to formulate a reactive 
schizophrenia syndrome, it would contain the 
following elements: (a) the prepsychotic per- 
sonality was relatively normal, (4) the onset 
of the psychosis was sudden with logical pre- 
cipitating factors present, and (c) the patient 
did not maintain a clear sensorium. 

The present study stems most directly from 
the work of Benjamin [2], who has pointed 
out that the Rorschach test is useful in reveal- 
ing thinking disorders and who finds that some 
patients diagnosed schizophrenic on the basis 
of psychiatric examination do not display 
thinking disorders or display such signs in 
slight degree. Signs of thinking disorders are 
“relatively independent of the stage of illness 
and the momentary clinical condition of the 
patient. They are, to be sure, found in almost 
all so-called deteriorated cases, but are also 
seen, sometimes, in less pronounced form, at 
very early stages, occasionally long before a 
clinical diagnosis has been made, as well as 
after a clinical recovery from a severe attack” 
[2, p. 67]. Thinking disorders may well be 
process symptoms according to Benjamin and 
the absence of such signs may well be indica- 
tive of reactive schizophrenia. In the present 
study, a nonpsychotic Rorschach diagnosis is 
similar to Benjamin’s absence of thinking dis- 
order, i.e., indicative of reactive schizophrenia, 
and a diagnosis of psychosis is similar to Ben- 
jamin’s thinking-disorder diagnosis, i.e., indica- 
tive of process schizophrenia. 


Problem 


The study reported here is concerned with 
three questions which arise if one entertains 
the process-reactive hypothesis. 

1. Do diagnoses based upon Rorschach ex- 
amination alone label as nonpsychotic a portion 
of the population of mental patients who are 
diagnosed clinically as schizophrenic? 


159 


2. Can case histories of clinically diagnosed 
schizophrenics be differentiated into two cate- 
gories: reactive and process? A search of the 
literature revealed criteria which were used as 
the frame of reference in rating cases as process 
or reactive from social service case histories. 
The criteria are given in Table 1. 

3. Are those cases rated psychotic from the 
Rorschach classified as process on the basis of 
the case histories and those cases judged non- 
psychotic from the Rorschach classified as re- 
active from the case histories? 


Method 


Subjects. Two successive samples of sub- 
jects were studied. The first consisted of all 
patients at the Palo Alto Veterans Hospital 
who were diagnosed schizophrenic and who 
were given Rorschach tests during 1948, a 
total of 180. The second sample is all of the 
schizophrenics (clinical diagnosis) given the 
Rorschach during the first six months of 1949 
at the same hospital. There were 147 such 
patients, three of whom were used in the first 
sample, giving a second-sample total of 144. 

Any patient who could be placed in any of 
the following categories was eliminated from 
the study: (a) diagnosis of latent schizophren- 
ia (4 cases), (6) diagnosis made from a bat- 
tery of tests rather than the Rorschach alone 
(56 cases), (c) less than seven main scorable 
Rorschach responses (29 cases), (d) diagnosis 
of schizophrenia complicated by a secondary 
diagnosis (9 cases), (¢) change of clinical 
diagnosis before an arbitrary date, i.e., May 2, 
1949 for the first sample and August 1, 1949 
for the second sample (23 cases). In all, 108 
first-sample patients and 95 second-sample 
cases were retained for rating. 


Diagnostic procedures and ratings. The psy- 
chiatric diagnoses were established on the basis 
of data collected by all appropriate services of 
the hospital, including psychological examina- 
tions. In each case, diagnosis was decided at 
a staff conference including representatives of 
all relevant hospital services and a psychiatric 
consultant. The final diagnosis was reviewed 
by the chief of psychiatry of the hospital. 
These diagnoses did not enter into the study 
directly, but defined the population for study. 

The Rorschach diagnoses were made on the 
basis of tests given by trainee psychologists 








160 


each of whom had adequate previous training 
in administration of the test. Analysis of the 
protocol was carried out by the person testing 
with the assistance of an advanced trainee. 
The final formulation and the protocol on 
which it was based were then reviewed by a 
staff psychologist who had prior interpretation 
experience with more than five hundred Ror- 
schachs. The analysis criteria used are those 
outlined by Klopfer and Kelley [13]. 

The two judges who rated the case histories 
for the first sample were the chief of psychiatry 
of the Palo Alto Veterans Hospital and a clini- 
cal psychologist from Stanford University. 
Each judge rated from the same history ma- 
terials at different times. There was no dis- 
cussion of the cases by the two judges and none 
of the psychological examination materials 
was available at the time of rating. Cases were 
catagorized: Process, Reactive, or Cannot Say. 


The chief of psychiatry had reviewed each 
case previously, and so had seen psychological 
examination and history materials together at 
a time prior to the time of making the ratings 
for this study. The clinical psychologist judge 
had seen the complete set of materials on only 
a very few of the cases at a previous time. 
Both judges are familiar with the diagnostic 
habits (psychiatric and psychological) at the 
hospital. The reader must decide for himself 
just how these factors contaminate the results. 
It is the conviction of the authors that the re- 
sults were determined little by these factors. 

As of the writing of this paper, the second 
sample has been rated only by the clinical psy- 
chologist judge. Before making the ratings, he 
had contact with only two of the patients and 
their Rorschach records and had seen none of 
the history materials of patients used in the 
second sample. 

Results 


1. Diagnoses based on the Rorschach pro- 
tocols label as nonpsychotic a large portion of 
the population diagnosed schizophrenic by con- 
ventional psychiatric procedures. This conclu- 
sion is supported by the facts that of the 108 
patients from the first sample, 57 were classi- 
fied psychotic and 51 nonpsychotic from the 
Rorschach alone, and of the 74 cases of the 
second sample which were rated process or re- 
active, 38 were labeled nonpsychotic and 36 


R. E. Kantor, J. M. Wallner, and C. L. Winder 


psychotic on the basis of the Rorschach. These 
results are in agreement with and support the 
conclusion stated above. This result in isola- 
tion might tempt some to be extremely critical 
of the clinical diagnoses or the Rorschach or 
both, since there is so little agreement between 
the two types of diagnoses. 

2. Case histories of clinically diagnosed 
schizophrenics can be differentiated reliably in- 
to two catagories: reactive and process. The 
relevant results are presented in Table 2, the 
data being from the first sample. Of the 108 
case histories, both judges arrived at a rating 
in 86 cases, and were in agreement in 64 of 
these cases. A y’ test, based on the distribu- 
tion of cases shown in Table 2, shows that 


Table 2 


Agreement between Judges Concerning Process 
and Reactive Schizophrenia* 

















Judge A 
Judge B Process Reactive Total 
Process 47 13 60 
Reactive 9 17 26 
Total 56 30 86 
*x2==13.40, p< .001 
Table 3 


Comparison of Judgments from the 
Case History and Rorschach 











Judgment 
Rorschach Reactive Process 
Judge A (First sample) 
Nonpsychotic 21 19 
Psychotic 10 37 
x?=7.87, 01> p> .001 
Judge B (First sample) 
Nonpsychotic 29 21 
Psychotic 5 52 
x? = 27.55, p< .001 
Judge A and B (First sample) 
Nonpsychotic 15 13 
Psychotic 2 34 
x? 16.24, p< .001 
Judge B (Second sample) 
Nonpsychotic 16 22 
Psychotic 5 31 
x*=7.2%4, 01>p)>.001 





a el 


Process and Reactive Schizophrenia 161 


this agreement is greater than would be ex- 
pected by chance. 

3. Those patients diagnosed as schizo- 
phrenic who are rated reactive from the histo- 
ry are most often judged nonpsychotic from 
the Rorschach and those rated process from the 
histories are most often judged psychotic from 
the Rorschach. Table 3 contains the relevant 
data. The independent judgments made from 
the case histories and those made from the 
Rorschach examinations are in greater agree- 
ment than could be expected by chance. This 
is the case for each judge on all rated cases 
and for the cases from the first sample on 
which both judges agreed. 


Discussion 


The central problem under consideration in 
this paper is the status of the reactive-process 
conception of schizophrenia. It is clear that 
this idea is still in a crude form. It is equally 
clear that the general idea has persisted, and 
it is the opinion of the authors that persistence 
of the view in the light of critical clinical 
evaluation reflects some validity. Granting 
this, our problem becomes the clarification, on 
an empirical basis, of this view. 

In this study, some questions relevant to the 
process-reactive view have been investigated 
empirically, with full realization that the im- 
plications of the findings would be in some 
doubt. The procedures employed in this study 
are subject to some criticisms, most of which 
have been mentioned. For these reasons the 
study must be viewed as preliminary in nature. 

It was determined that diagnosed schizo- 
phrenics could be further separated into the 
catagories process and reactive with some re- 
liability. No information is available at pres- 
ent to indicate with precision the meaning of 
these categories. One thought is that the re- 
active cases are either milder or earlier than 
process cases. If Benjamin is correct in his 
opinion that the presence or absence of in- 
dicators of thinking disorders is relatively in- 
dependent of duration and degree of illness, 
and if, as our data suggest, the process cases 
show evidences of thinking disorders while re- 
active cases do not, then the equating of re- 
active with early or mild schizophrenia is at 
least insufficient and is possibly incorrect. 

Benjamin’s work led him to adopt the point 


of view that schizophrenics with thinking dis- 
orders as measured by the Rorschach are apt 
to follow the classical dementia praecox course, 
whereas those who do not show thinking dis- 
orders on the Rorschach tend to follow an 
atypical course with some degree of recovery. 
We have not presented evidence directly rele- 
vant to this point, but a brief follow-up of our 
patients does indicate a higher rate of discharge 
from the hospital among cases called reactive 
than among the process cases. 

The judgments of the case histories made 
in the present study were based on premorbid 
personality, the nature of the illness to some 
extent, and on some knowledge of the course 
of the disorder. The latter information was 
a part of our histories since most of the cases 
had been hospitalized in the service; better 
samples of cases would be groups of first-ad- 
mission cases. The judgments of the Rorschach 
performances in this study were based in part 
on criteria of thinking disorders, but also upon 
general personality structure and function. 
The agreement with the results of Benjamin 
[2] is found even though the methods are not 
strictly parallel. This suggests that the general 
conclusions can be held with some confidence. 

A brief comment on the lack of agreement 
between the Rorschach and the clinical diag- 
noses is perhaps in order. Suffice it to say that 
such agreement could undoubtedly be system- 
atically varied depending on the particular 
diagnostic approaches used. There is no reason 
to believe that the diagnostic approaches at the 
hospital where this study was made are atypi- 
cal in any important ways. It would seem that 
the Rorschach alone, as conventionally used, 
would be an inadequate criterion on which to 
base a psychiatric diagnosis. It seems equally 
clear that the Rorschach provides a relatively 
economical way of adding important data re- 
garding the particular variety of schizophrenia 
being dealt with. 

An additional implication of the results of 
the present study, taken with other evidence, 
is that diagnosis as usually practiced is not 
sufficiently specific to give maximum differenti- 
ating information about schizophrenic patients. 

Summary 

Two samples of schizophrenic patients were 
studied. In each case, the patient was classified 
as reactive or process on the basis of the history 








162 


and psychotic or nonpsychotic on the basis of 
the Rorschach. The following results were ob- 
tained: 


1. Schizophrenic cases can be reliably dif- 
ferentiated into process and reactive groups. 

2. Those cases classified as reactive from 
histories tend to be called nonpsychotic from 
the Rorschach. 

3. Those cases classified as process from 
histories tend to be called psychotic from the 
Rorschach. 

These results support the view that the diag- 
nostic category of schizophrenia can be legiti- 
mately elaborated to include the classifications 
reactive and process. There is evidence that 
process and reactive cases differ in their psy- 
chological-function characteristics. 

On the basis of these results, further analyti- 
cal study of the reactive-process hypothesis is 
indicated. The full implications of the agree- 
ments demonstrated can be derived only from 
these further studies. 


Received October 24, 1952. 


References 


1. Bellak, L. Dementia 
Grune & Stratton, 1948. 


2. Benjamin, J. D. A method for distinguishing 
and evaluating formal thinking disorders in 
schizophrenia. In J. S. Kasanin (Ed.), Lan- 
guage and thought in schizophrenia. Berkeley: 
Univer. Calif. Press, 1946. Pp. 66—71. 


3. Blair, D. Prognosis in schizophrenia. J. ment. 
Sci., 1940, 86, 378-477. 

4. Bleuler, E. Dementia praecox oder Gruppe der 
Schizophrenien. Leipzig: Deuticke, 1911. 

5. Bleuler, E. The physiogenic and psychogenic 
in schizophrenia. Amer. J. Psychiat., 1930, 10, 
203-211. 


6. Bleuler, E. Textbook of psychiatry. Trans. by 
A. A. Brill. New York: Macmillan, 1936. 


7. Ebaugh, F. G., & Benjamin, J. D. Trauma and 
mental disorders. In L. Brahdy & S. Kahn 
(Eds.), Trauma and disease. Philadelphia: 
Lea & Febiger, 1941. Pp. 272-273. 

8. Hoch, A. Constitutional factors in the de- 
mentia praecox group. Rev. Neurol. Psychiat., 
1910, 8, 463-474. 

9. Hunt, R. C. & Appel, K. E. Prognosis in 
psychoses lying midway between schizophrenia 
and manic-depressive psychoses. Amer. J. Psy- 
chiat., 1936, 93, 313-339. 

10. Kant, O. Differential diagnosis of schizo- 
phrenia in light of concepts of personality- 


praecox. New York: 


11, 


12. 


13. 


14, 


17. 


18. 


19, 


20. 


21. 


22. 


23. 


24. 


26. 


27. 


28. 


29. 


R. E. Kantor, J. M. Wallner, and C. L. Winder 


stratification. Amer. J. Psychiat., 
342-357. 


Kant, O. Problem of psychogenic precipitation 
in schizophrenia. Psychiat. Quart., 1942, 16, 
341-350. 

Kasanin, J., & Veo, L. A study of the school 
adjustment of children who later in life be- 
come psychotic. Amer. J. Orthopsychiat., 1932, 
2, 212-227. 

Klopfer, B., & Kelley, D. G. The Rorschach 
technique. Yonkers, N. Y.: World Book Co., 
1942. 

Kraepelin, E. Dementia praecox. Trans. from 
8th German Ed. of Textbook of psychiatry. 
Edinburgh: E. J. Livingstone, 1918. 
Kraepelin, E. Dementia praecox and para- 
phrenia. Trans. by Mary Barclay. Edinburgh: 
E. S. Livingstone, 1919. 

Kretschmer, E. Physique and character. New 
York: Harcourt, Brace, 1925. 

Langfeldt, G. Prognosis in schizophrenia and 
factors influencing course of disease: Catam- 
nestic study, including individual re-examina- 
tions in 1936 with some considerations regard- 
ing diagnosis, pathogenesis and therapy. Acta 
Psychiat. Neurol., Supp. 13, 1937. Pp. 1-228. 
Langfeldt, G. Prognosis and diagnosis of 
schizophrenia. Norsk. mag. f. laegevidensk., 
1938, 99, 589-609. 

Langfeldt, G. The diagnosis of schizophrenia. 
Amer. J. Psychiat., 1951, 108, 123-125. 

Lewis, N. D. C. Research in dementia praecox. 
New York: National Committee for Mental 
Hygiene, 1936. 

Mauz, F. Die Prognostik der endogenen Psy- 
chosen. Leipzig: G. Thieme, 1930. 

Meduna, L. J., & McCullough, W. S. The 
modern concept of schizophrenia. Med. Clia. 
N. Amer., 1945, Chicago Number, 147-164. 
Meyer, A. F-:ndamental conceptions of demen- 
tia praecox. Brit. med. J., 1906, 2, 757-760. 
Milici, P. Postemotive schizophrenia. Psychiat. 
Quart., 1939, 13, 278-293. 

Paskind, H. A., & Brown, M. Psychoses re- 
sembling schizophrenia occurring with emo- 
tional stress and ending in recovery. Amer. J. 
Psychiat., 1940, 96, 1379-1388. 

Wittman, Phyllis. Scale for measuring prog- 
nosis in schizophrenic patients. Elgin State 
Hosp. Papers, 1941, 4, 20-33. 

Wittman, Phyllis. Diagnostic and prognostic 
significance of the shut-in personality type as a 
prodromal factor in schizophrenia. J. clin. Psy- 
chol., 1948, 4, 211-214. 

Wittman, Phyllis, & Steinberg, D. L. Follow- 
up of objective evaluation. Elgin State Hosp. 
Papers, 1944, 5, 216-227. 

Wittman, Phyllis, & Steinberg, D. L. Study of 
prodromal factors in mental illness with spec- 
ial reference to schizophrenia. Amer. J. Psy- 
chiat., 1944, 100, 811-816. 


1940, 97, 


Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


Prognostic Criteria in the Case Histories 
of Hospitalized Mental Patients ° 


G. R. Pascal,? C. H. Swensen’ 


The University of Tennessee 


Dorothy A. Feldman, Mary E. Cole, and Jean Bayard 


University of Pittsburgh 


The purpose of the present study is to re- 
port the relationships between certain aspects 
of information obtainable from the records of 
hospitalized mental patients and the outcome 
of treatment. The feasibility of such a project, 
and the many objections which can be raised 
against such an approach to prognosis, are out- 
side the province of this study. Suffice it to say 
that in the course of planning the present study 
over 500 published papers were read which 
used information from case records in an at- 
tempt to find prognostic criteria. In spite of 
this voluminous literature, however, there is a 
distressing dearth of definitive studies. 

The data upon which this study is based are 
precarious. In the seven years covered by the 
study different psychiatrists and psychiatric 
social workers of varying conceptions and ca- 
pacities contributed to the materials which 
make up the case histories. Admission policies 
have changed. Ideas of treatment have fluctu- 
ated. Similar criticisms apply to other similar 
studies. Nevertheless, this approach to the 
study of prognostic criteria continues to be 
popular and has, indeed, resulted in several 
generally accepted and repeatedly found prog- 
nostic criteria, e.g., duration of illness [3, 24, 
36, 38, 50], presence of affect [51, 49, 31, 27], 
and acuteness of onset [26, 45, 46, 48]. It is 


1The authors acknowledge their indebtedness to 
Drs. H. Brosin, Director, and F. Weniger, Clinical 
Director of the Western Psychiatric Institute, Uni- 
versity of Pittsburgh, for permission to use the rec- 
ords. Special thanks are due Miss Charlotte Bat- 
tles, secretary, for assistance in data gathering and 
as secretarial overseer of the project. 


2At the University of Pittsburgh when this re- 
search was done. 


163 


our contention, however, that the methodology 
usuaily employed does not make maximum use 
of the data and, in addition, tends to obscure 
rather than define, the relationships which may 
exist between variables and outcome of treat- 
ment. 

We may, to illustrate our point, consider a 
specific variable, duration of illness, which has 
been generally found to be related to outcome 
of treatment both for the schizophrenic [3, 4] 
and manic-depressive [43] disorders. The us- 
ual procedure is to relate duration of illness to 
outcome of treatment without consideration of 
other variables which might affect outcome of 
treatment. The general finding has been that 
the shorter the duration the greater the chance 
of the patient to recover from his illness. Such 
findings have led to the oft held belief that 
early treatment is the key to recovery in mental 
illness. There is some evidence, on the other 
hand, to indicate that duration of illness as a 
prognostic variable may be an artifact. Witt- 
man [53], for instance, in her extensive studies 
of prognosis, found a number of process schizo- 
phrenics with unfavorable outcome to be in the 
short-duration group. In a study comprising 
960 cases, she found 134 with short duration 
who did not improve. She feels that by the 
usual statistical methods employed, the patients 
with a longer duration of illness include a 
heavier weighting of process schizophrenics, 
which gives long duration its poor prognosis. 

The need is clear, we think, for better-con- 
trolled studies of such variables as duration of 
illness, mode of onset, age of onset, marital 
status, precipitating stress, etc. If data obtained 
from case records are to be used in a search 








164 


for prognostic criteria, then more valid meth- 
ods of study should be employed. When a 
variable such as duration of illness is studied, 
other variables, known to be related to out- 
come of treatment, need to be held constant. 
Such a statement does not mean to imply that 
all the variables which may affect outcome of 
treatment can be held constant while a single 
variable is being studied. Sufficent knowledge 
to encompass such a possibility, if it were 
feasible, is not now available. We do wish to 
state, however, that a much clearer notion of 
the relationship between duration of illness and 
outcome of treatment can be obtained if all the 
other variables obtainable from a case record 
and known to affect outcome are held constant 
while the duration is varied. 

‘Lhe present study, of which this paper is 
the first in a series, attempts a controlled study 
of prognostic criteria obtained from case-record 


G. R. Pascal, C. H. Swensen, D. A. Feldman, M. E. Cole, and J. Bayard 


material. Variables known to be significantly 
related to outcome of treatment are extracted 
from case records. Single variables are studied 
while other significant variables are, insofar as 
possible, held constant. 


Method and Procedure 


The Western Psychiatric Institute and 
Clinic (WPIC) was opened in November, 
1942. The records of all patients admitted to 
the hospital who were discharged not later 
than July, 1950 (to allow for a minimum of 
one year follow-up) were examined. These 
comprised 1118 records. The following selec- 
tive criteria were now applied to the records: 
age—15 to 55 
lack of known cortical damage 
absence of mental deficiency 
absence of incapacitating somatic illness 
absence of previous attacks of mental illness. 


Table 1 


Total Population: Status One Year after Discharge 


According to Diagnostic Category 














Mean Status 1 yr. 
Diagnostic Category Sex Mean ed. (in after disch. 
N M I age grades) Imp. Unimp. 
Schizophrenics 
Paranoid 123 67 56 33.2 10.6 36 87 
Catatonic 76 35 41 24.8 10.8 38 38 
Hebephrenic 32 19 13 25.9 9.8 3 29 
Simple 11 7 + 26.8 13.2 6 5 
Mixed 22 7 15 29.3 11.3 8 i+ 
lotal schizophrenic 264 135 129 29.3 10.7 91 173 
Manic-depressives 
Manic 18 6 12 28.0 11.7 17 1 
Depressed 19 10 9 36.6 10.0 10 y 
Total manic-depressive 37 16 21 32.4 10.8 27 10 
Involutional melancholia 21 3 18 44.3 7.8 14 7 
Total affective psychoses 58 19 39 36.7 9.7 41 17 
Paranoid condition 14 8 6 39.7 11.1 + 10 
Other psychotics 3 2 1 32.3 10.7 2 1 
All psychotics 339 164 175 31.0 10.5 138 201 
Psychoneurotics 
Obsessive-compulsive 20 9 11 32.6 11.0 15 5 
Hysteria 11 3 27.2 10.4 6 5 
Anxiety 9 5 29.7 11.7 7 2 
Reactive depression 20 7 13 34.0 11.5 17 3 
Hypochondriasis 10 6 37.8 8.4 7 3 
Mixed 25 10 15 33.6 9.7 19 6 
All psychoneuroses 95 +u 55 32.8 10.5 71 24 
Behavior disorders 52 25 27 23.4 9.6 25 27 
All nonpsychotic 147 65 $2 29.5 10.2 96 51 
All patients 486 229 257 30.5 10.4 234 252 





Prognostic Criteria in Case Histories 165 


The last criterion requires some explanation. 
The WPIC is primarily a referring hospital. 
Most of its patients come from other hospitals. 
The criterion “absence of previous attacks of 
mental! illness” does not, therefore, mean that 
the patient was first hospitalized at WPIC. It 
means that upon admission to WPIC the pa- 
tient was suffering from one continuous attack 
of mental illness without known prior attacks 
which resulted in referral to a psychiatrist. Ap- 
plication of these criteria left 486 cases avail- 
able for further study. 

Table 1 shows the composition of the 486 
cases. The table shows the sex, mean age, mean 
education in grade years, and status one year 
after discharge by diagnostic category. Our 
interest, at this point of discussion, centers 
about the representativeness of the sample with 
respect to outcome of treatment. The sample 
contains 264 schizophrenics. Of these, 91 or 34 
per cent improved. This figure compares fav- 
orably with that of Taylor and Van Salzen 
[48] whose study contained 1100 schizophren- 
ic patients, 37 per cent of whom were able to 
complete, successfully, one year of parole. 
Within the schizophrenic group, only the cata- 
tonics and paranoids are in sufficient number 
to relate to other studies. Of the 123 paranoids, 
36 or 29 per cent improved. Of the 76 cataton- 
ics, 38 or 50 per cent improved. The differ- 
ential between catatonics and paranoids is in 
keeping with expectancy based on results ob- 
tained by other investigators [40, 33]. 

The sample contains 58 cases of affective 
disorders of various types. Of these, 41 or 70 
per cent improved. The total number of cases 
is too small for any valid comparisons, but it 
is worth noting that the increased rate of im- 
provement in contrast with the schizophrenic 
group is in accordance with expectancy both 
for the involutional and manic-depressive dis- 
orders [20, 17, 12]. 

Of the 147 nonpsychotic cases, 96 or 65 per 
cent improved. Ross [39], in a one-year fol- 
low-up of 1186 hospitalized neurotics, found 
70 per cent improved. The several subtypes in 
this category contain too few cases for valid 
comparisons with other findings. It is worth 
noting, however, that the highest recovery rate 
is found for reactive depressions, which is in 
accordance with expectancy. [23]. 

These findings lend credence to the validity 


of our criteria, a crucial consideration in a 
study attempting to relate case-history variables 
to outcome of treatment. This correspondence 
between our findings and those of other in- 
vestigators may be thought of as an estimate 
of the reliability of our criteria of improve- 
ment. 

Criteria of improvement. A patient was 
considered improved if, one year after dis- 
charge from the WPIC, he was working at a 
level of efficiency roughly equivalent to that 
which obtained prior to illness. He was con- 
sidered unimproved if, one year after discharge 
from the hospital, he was in another mental 
hospital, or, if not in a hospital, then incapaci- 
tated to the extent of being unable to be gain- 
fully employed, or manifesting other behavior 
indicative of inability to cope with his environ- 
ment in comparison with expectation of a per- 
son of like status. Only the dichotomy “im- 
proved”—“unimproved” was used. Borderline 
cases were judged, on the basis of available 
evidence, to fall in one category or the other. 
Most judgments, particularly for the im- 
proved, were made on the basis of observations 
made by psychiatrists or psychiatric social 
workers and recorded in the case records. The 
unimproved were judged, for the most part, on 
the basis of reports from other mental hospi- 
tals to which the patients had been transferred. 
This procedure was followed for 445 cases on 
whom follow-up material was available one 
year after discharge. Forty-one cases were 
judged by the following criteria: if the patient, 
on discharge from this hospital, was trans- 
ferred to another hospital, unimproved with 
recommendation for commitment, and no fol- 
low-up information was available, the patient 
was judged unimproved ; if the patient was dis- 
charged from this hospital, improved, and re- 
leased to return to his previous or other similar 
environment, and no follow-up information 
was available, the patient was judged im- 
proved. 

The variables. The variables used in the 
study represent a compromise between those 
variables deemed important from a considera- 
tion of evidence in the literature and the prac- 
tical job of culling reasonably reliable infor- 
mation from the records. The variables are list- 
ed below. Definitions are given where indicat- 
ed. In some cases only a brief, orienting defini- 








166 


tion is given ; these are the variables which will 
be given later, fuller treatment. 


1. Sex 


2. Age at admission to WPIC 


* 


3. Marital status—three categories: single, mar- 
ried, divorced, or widowed 

4. Religion—Catholic, Protestant, Jewish, or 
Greek Catholic 

5. Nationality—of the patient or his parents: 
American; North European including France, Ger- 
many, British Isles, Low Countries, Scandinavia, 
Austria; South European including Italy, Balkan 
countries, Spain, and Portugal; East European in- 
cluding Russia, Poland, Czechoslovakia, Hungary, 
and Baltic States 


6. Occupation—four categories: professional, in- 
cluding teachers, physicians, lawyers, engineers, sci- 
entists, artists, musicians; semiprofessionai, includ- 
ing businessmen, salesmen, college students, bank 
tellers, dieticians, secretary (college graduates), 
etc.; skilled, including machinists, <lectricians, sta- 
tionary engineers, cranemen, clerks. chauffeurs, ste- 
nographers, miners, high-school students; and un- 
skilled, including day laborers, handymen, farm- 
hands, domestic, unemployed, grade-school students 
level—four categories 
average, marginal, dependent 

8. Body Type—only height and weight were 
available in the records. With these data only a 
crude estimate of body type could be attempted. 
Following Sheldon’s suggestion [41] that height 
over cube root of weight would give an estimate of 
body type for the earlier years, this ratio was calcu- 
lated for all patients under 30 years of age. 

9. Education—calculated by grades 

10. Intellectual status. IQ was not used. In an 
attempt to obtain a more valid measure of this vari- 
able, the clinician’s estimate of the patient’s level 
was taken rather than the IQ reported. Where the 
patient had not been seen by a psychologist, then 
the authors made a judgment based on the case- 
history material. Categories used were, above av- 
erage, average, and below average. 

11. Affective expression—a nine-point scale was 
constructed to rate this variable. Briefly, an attempt 
was made to rate the amount of affect displayed by 
the patient. This variable will receive detailed 
treatment in a later paper of the study [2]. 

12. Orientation—the psychiatrist’s report as to 
whether or not the patient was oriented for time, 
place, or person was used to place the patient into 
one of two categories as follows: oriented, not 
oriented. 

13. Direction of aggression—a nine-point scale 
was constructed to rate this variable. Briefly, an at- 
tempt was made to rate whether the direction of 
aggression displayed by the patient was towards 
himself or others. This variable will receive de- 
tailed treatment in a separate, later paper [10]. 


7. Economic comfortable, 


G. R. Pascal, C. H. Swensen, D. A. Feldman, M. E. Cole, and J. Bayard 


14. Diagnosis—the patient’s final diagnosis, 
reached by agreement between the attending psy- 
chiatrist and the clinical director, or on the basis 
of a staff conference attended by several psychi- 
atrists 
at onset of mental illness—the 
the attending psychiatrist and/or psy- 
chiatric social worker 

16. Type of onset—a nine-point scale constructed 
to rate the continuum running from acute to in- 
sidious This variable will receive detailed 
treatment in a separate paper [47]. 

17. Duration of the patient’s illness—the opinion 
of the attending psychiatrist and/or psychiatric 
social and reported as such in the case 
record. This variable will be the subject of a later 
paper. 

18. Total time of hospitalization—the total time, 
in months, spent by the patient in a mental hospital 

19. Status on discharge from WPIC—the psy- 
chiatrist’s opinion of the patient’s condition im- 
mediately upon discharge. Categorized as improved 
or unimproved. 


15. Patient’s age 


opinion of 


onset, 


worker 


20. Precipitating stress—a nine-point scale con- 
structed to rate the amount of environmental stress 
experienced by the patient immediately preceding 
the onset of illness. This variable will receive 
detailed treatment in a separate paper [6]. 

21. Amount of treatment received by the patient— 
three-point scale. Point ome: routine hospital care, 
including psychiatric interviews, occupational thera- 
py, and hydrotherapy. Point two: electroconvulsive 


therapy, insulin therapy, metrazol therapy, ete. 
Point three: somatic therapies plus prefrontal lob- 
otomy. 


This paper of the series will concern itself 
with reporting the relationship between each 
of these variables and the outcome of treat- 
ment, using appropriate statistical techniques. 
It will, in addition, report the interrelation- 
ships between those variables found to be sig- 
nificantly related to outcome of treatment. For 
this purpose a value of .01 is considered in- 
dicative of a significant relationship. Later 
papers will report on single variables and their 
relationship to outcome of treatment when all 
the other variables found to be similarly re- 
lated are held constant, insofar as possible. A 
final paper will present a discussion of the en- 
tire study, summary, and conclusions. 


Results and Discussion 


Each of the 21 variables was related to 
status one year after discharge from the hospi- 
tal. Where the independent variable was cate- 
gorized, chi square was used; where it was 
continuous, the biserial correlation coefficient 





Prognostic Criteria in Case Histories 


was used. Table 2 shows the results of statisti- 
cal calculations. Of the 21 variables, 10 seem 
to be unrelated to the criterion, with probabil- 
ity values of .02 or greater. These variables are 
sex, age, religion, nationality, occupation, eco- 
nomic level, body type, education, intellectual 
status, and age of onset. We shall not consider 
these variables further except to note that simi- 
lar results are reported in the literature for 
each finding [43, 45, 42, 4, 16] except that for 
body type. For what it is worth the low re- 
lationship obtained tends to agree with Malam- 
ud and Render [28] and Kline and Tenny 
[25], who report a tendency for patients high 
in endomorphy to have a poor prognosis. 


Table 2 


Independent Variables Related to 
Status One Year after Discharge from WPIC 











Variable Statistic p Value 
1. Sex chi square — 5.59 .02 
2. Age This = 04 05 
3. Marital status chi square — 12.87 01 
4. Religion chi square — 7.48 -10 
5. Nationality chi square — 90 90 
6. Occupation chi square — 5.12 .20 
7. Economic level chi square = 4.57 30 
8. Body type This = .08 05 
9. Education This = -10 05 
10. Intellectual 

status chi square — 3.30 .20 
11. Affective ex- rj, = .32 01 

pression 
12. Orientation chi square — 16.54 .001 
13. Direction of 

aggression This ms .67 01 
14, Diagnosis chi square — 49.45 01 
15. Age of onset 1;;, = .09 05 
16. Type of onset r};, = 31 01 
17. Duration of 

illness This = 38 01 
18. Hospital stay 1;,;, = 37 01 
19. Status on dis- 

charge chi square — 244.65 001 
20. Precipitating fj, = .25 01 

stress 
21. Treatment chi square — 16.42 .001 





At least two impressive recent studies, how- 
ever [29, 8], and a third older study [21] sug- 
gest that outcome of treatment may be related 
to intellectual status. We suggest the possibil- 
ity that if obtained IQ’s are used as an esti- 
mate of intellectual status, a relationship may 
be found between IQ scores and unfavorable 


167 


outcome, because of the possible relationship 
between severity of illness and psychological 
deficit [19, 37, 35]. The discrepancy between 
the present findings and those reported else- 
where may be due to the fact that in the pres- 
ent study an attempt was made to assess intel- 
lectual status prior to the onset of illness. 

Eleven variables were found to be related 
to status one year after discharge at acceptable 
levels of confidence. Table 4 shows the perti- 
nent interrelationships between variables. Five 
of these, affective expression, direction of ag- 
gression, type of onset, duration of illness, and 
precipitating stress, will be the objects of de- 
tailed study and presented in separate papers. 
The other six will be dealt with here. 

Marital status. The results obtained, ex- 
pressed in percentages, are contained in Table 
3. These results suggest that the divorced or 
separated and the single patients tend to have 
a less favorable prognosis than the married 
ones. These results accord with expectation 
based upon other reports in the literature [30, 
16, 21, 28]. Wall [49] suggests that history 
of some attempt at sexual adaptation tends to 
indicate a good prognosis. 


Table 3 
Percentages of Patients Improved and Unimproved 
for Marital Status, Orientation, Diagnosis, 
Discharge Status, and Treatment 








Status 1 yr. 





Criterion after discharge 
N Imp. Unimp. 
Marital Status 
Single 275 41% 59% 
Divorced, Separated 41 39 61 
Married 170 56 44 
Orientation 
Oriented 413 50 50 
Disoriented 73 7 73 
Diagnosis 
Schizophrenia 264 34 66 
Other psychoses 75 71 29 
Nonpsychotic 147 65 35 
Discharge status 
Improved 257 80 20 
Unimproved 229 8 92 
Treatment Rating 
1 147 47 53 
2 300 50 50 
3 39 16 84 











168 


Orientation. This variable seems to be sig- 
nificantly related to status one year after dis- 
charge. The disoriented patients, as defined, 
tend to have an unfavorable prognosis. Little 
relationship is found between orientation and 
outcome if the patient is oriented, as the tabu- 
lation in Table 3 indicates. A number of previ- 
ous reports can be found which tend to support 
these findings [52, 51, 49, 27]. On the other 
hand, an even greater number of previous 
studies seem to argue against acceptance of the 
present findings [31, 22, 16, 13, 5, 4, 3]. The 
weight of the evidence in the literature seems 
to favor a relationship between good prog- 
nosis and confusion, clouding, perplexity, bi- 
zarreness of behavior, in fact, all the behaviors 
which might contribute to what, in the present 
study, has been categorized as disorientation. 
Orientation is, we feel, the kind of variable 
that will give results which are apt to fluctu- 
ate, depending on other characteristics of the 
population sample. 

The prognostic value of disorientation prob- 
ably depends upon how and in what context of 
other variables disorientation is displayed. 
Thus if delusions accompany disorientation, 
then there is evidence to indicate the type of 


G. R. Pascal, C. H. Swensen, D. A. Feldman, M. E. Cole, and J. Bayard 


delusions displayed influences prognosis [1]. If 
hallucinations contribute to disorientation, then 
there are indications that visual hallucinations 
are less ominous than auditory [51]. Table 4 
indicates that orientation is significantly related 
to a number of other variables which are them- 
selves significantly related to outcome. We 
might, for the sake of example, take the rela- 
tionship found between direction of aggression 
and orientation. For those patients categorized 
disoriented, 70 per cent directed their aggres- 
sion towards others, 23 per cent did not direct 
their aggression primarily in any one direction, 
and 7 per cent directed their aggression 
towards themselves. The implication here is 
clear that disorientation in our sample is re- 
lated to behavior indicative of projection. In 
the same way it could be shown that disorien- 
tation is related to lack of precipitating stress, 
lack of affective expression, and the diagnosis 
of schizophrenia. We do not feel, in other 
words, that the fact that a patient is oriented 
or disoriented for time, place, and person at 
the time of the mental status is, by itself, a 
particularly significant datum upon which to 
base prognosis. It is difficult to see how a clear 
notion of the relationship between orientation 














Table 4 
Interrelationships between Significant Variables with Probability Values} 
Prec. 
Aff. Exp Orient. Dir. Agg. Diag. Onset Dur. Ill. Stress Treat. 
Po eee 61.2 24" 109.7 .06* 26.9 18* 15.7 
>.05 
Orient. ae ees 12.8 55.9 10.6 4.2 6.5 25.4 
>.05 <.05 
Dir. agg. 24° | See 132.9 .20* 24.3 23° 189 
Diagnosis 109.66 559 132.9 enliniiion 11.9 11.3 12.9 57.6 
<.02 >.05 <.02 
Type onset .05* 10.6 20* _—  \<_e 33 11* 19.3 
>.05 <.05 
Dur. ill. 269 4.2 24.3 11.26 78 . ieee 19.3 23.5 
>.05 >.05 
Precip. stress 18* 6.5 23° 12.9 11* DGn< <0: jaliieacon 81 
<.05 <.02 <.05 >.50 
Treatment 15.7 25.4 189 57.6 19.3 23.5 | oo 
>.50 
Marita! status 35.18 4.21 9.41 5.84 6.10 4.25 22.24 1.44 
<.20 <.05 <.10 <.05 <.20 <.50 
*Correlation coefficients. All other values are chi squares. 
tItalicized values indicate significant relationships with probability values less than .01; other p values are 





Prognostic Criteria in Case Histories 


and outcome can be obtained unless the vari- 
ables related to it and outcome are controlled. 
By the same token, the significant relationship 
found between orientation and outcome points 
to the need to control for this variable in 
studying the relationship between other vari- 
ables and outcome. 

Diagnosis. The patient’s diagnosis is signi- 
ficantly related to status one year after dis- 
charge. Table 3 summarizes the findings. 
These findings are in accordance with expect- 
ancy and corroborating studies have already 
been cited. Table 4 shows that diagnosis is also 
significantly related to affective expression, 
orientation, direction of aggression, and amount 
of treatment given. Of the three categories, 
schizophrenics tend to have less affective ex- 
pression; the direction of their aggression is 
towards others, they tend to show more dis- 
orientation, and to be given more treatment. 
Each of these variables must be controlled in 
any careful study of the relationship between 
diagnostic category and outcome of treatment. 
It is our guess that if such controls are im- 
posed, it is the relationship between diagnostic 
category and such variables as affect, direction 
of aggression, type of onset, etc., which gives 
diagnosis its prognostic validity. It is worth 
noting that, in this study, a nonsignificant re- 
lationship was found between diagnostic cate- 
gory and duration of illness. 

Total hospitalization. This variable is sig- 
nificantly related to status one year after dis- 
charge. The longer the hospital stay the more 
tendency for the patient to remain unimproved 
one year after discharge. This variable is so 
highly related to duration of illness and the 
same variables which are related to duration of 
illness that we shall not consider it further, 
here, except to note that total hospitalization 
needs to be controlled when the relationship 
between other variables and outcome is studied. 

Status on discharge. There is a consistent 
tendency for those who are discharged im- 
proved to remain improved for one year after 
discharge. The same tendency applies to those 
discharged unimproved. The prediction of stat- 
us one year after discharge from status on dis- 
charge is quite good, as indicated by the sum- 
mary of findings in Table 3. The implication, 
here, is that the results of this study might also 
apply fairly well to the immediate outcome of 


169 


treatment, although with a lesser degree of 
accuracy. 

Treatment. ‘Total treatment, as crudely 
rated in this study, seemed to be significantly 
related to outcome. As the summary in Table 
3 indicates, however, most of its significance 
derives from the great percentage of unim- 
proved in the category which comprises the 
lobotomies. There seems to be little relation- 
ship between treatment ratings one and two, 
and outcome. A number of studies can be cited 
which corroborate this finding [14, 18, 15, 9, 
5, 3], ie., that there is little relationship be- 
tween amount of somatic treatment and prog- 
nosis. A study by Danziger and Kindwall [7], 
however, suggests that the outcome of treat- 
ment in dementia praecox depends on the num- 
ber of treatments either of insulin or electro- 
convulsive therapy, when duration of illness is 
held constant. Unfortunately, our treatment 
data are too crude to enable us to check on this 
very interesting finding with any degree of ac- 
curacy. 

Table 4 shows that the treatment rating is 
significantly related to affective expression, 
orientation, direction of aggression, diagnosis, 
type of onset, and duration of illness. By and 
large, these relationships hold even when the 
lobotomies (treatment rating three) are ex- 
cluded as Table 5, relating to treatment rating, 
indicates. 


Table 5 
Treatment Received by Patients 
with Various Diagnoses 








Treatment Rating 





1 2 
routine 
hosp. physical 
Diagnosis N care therapy 
Schizophrenia 232 19% 81% 
Other psychoses 71 31% 69% 
Nonpsychotic 144 57% = 43% 





It is evident, therefore, that if the relation- 
ship between other variables and outcome is so 
studied, the treatment rating needs to be held 
constant. 


Summary and Conclusions 


Using predetermined criteria, 486 case rec- 
ords were selected from the files of the West- 
ern Psychiatric Institute and Clinic. These 








170 


records comprised all the cases meeting the 
criteria up to July, 1950. All were records of 
patients between the ages 15 and 55, without 
mental deficiency, without known damage to 
the cortex, and without record of previous at- 
tacks of mental illness. The patients’ statuses 
one year after discharge from the hospital were 
determined and categorized improved or un- 
improved. The records were examined and 21 
variables extracted which were, by appropriate 
statistical techniques, related to status one year 
after discharge. Of the 21 variables, 11 were 
found to be of prognostic significance at the 1 
per cent level of confidence. These are marital 
status, affective expression, orientation, direc- 
tion of aggression, diagnosis, type of onset, dur- 
ation of illness, length of hospital stay, status 
on discharge, precipitating stress, and amount 
of treatment. The interrelations between these 
variables are presented and discussed. 

The primary purpose of this study is to 
demonstrate methodology. Emphasis is given to 
the demonstrated need to control for other sig- 
nificant variables while the prognostic signifi- 
cance of a single variable is being studied. The 
present section of the study has been concerned 
with the presentation of significant variables 
and their interrelationships. Other following 
sections will present carefully controlled stud- 
ies of well-defined, single variables and their 
relationship to status one year after discharge. 
A final section will summarize the findings 
and, by statistical techniques, suggest the most 
efficient combination of prognostic variables. 
Received October 6, 1952. 


References 

1. Albee, G. W. The prognostic importance of 
delusions in schizophrenia. J. 
Psychol., 1951, 46, 208-212. 

2. Bayard, Jean. Studies of prognostic criteria 
in the case histories of hospitalized mental pa- 
tients. V. Affective Expression. Unpublished 
master’s thesis, Univer. of Pittsburgh, 1952. 

3. Bellak, L. Dementia praecox. New York: 
Grune & Stratton, 1948. 

4. Chase, L. S.. & Silverman, S. Criteria in 
schizophrenia: critical survey of literature. 
Amer. J. Psychiat., 1941, 28, 360-368. 

5. Cheney, C. W., & Clow, H. E. Prognostic fac- 
tors in insulin shock therapy. Amer. J. Psy- 
chiat., 1941, 97, 1029-1039. 

6. Cole, Mary E. Studies of prognostic criteria 
in the case histories of hospitalized mental 
patients. III. Precipitating stress. Unpublished 


abnorm. soc. 


13. 


14. 


16. 


18, 


19. 


21. 


23. 


G. R. Pascal, C. H. Swensen, D. A. Feldman, M. E. Cole, and J. Bayard 


doctor’s dissertation, Univer. of Pittsburgh, 
1952. 

Danziger, L., & Kindwall, J. A. Prediction of 
immediate outcome of shock therapy in de- 
mentia praecox. Dis. nerv. Syst., 1946, 7, 299- 
303. 

Dewan, J. G. Intelligence and emotional 
stability. Amer. J. Psychiat., 1948, 104, 548- 
554. 

Dukor, B. Prognosis. Schweiz. med. Wchms- 
chr., 1939, 69, 45—92. 


Feldman, Dorothy. Studies of prognostic cri- 


teria in the case histories of hospitalized men- 
tal patients. II. Direction of aggression. Un- 
published doctor’s dissertation. Univer. of 


Pittsburgh, 1952. 
Fishbein, I. L. 
convulsive therapy. 
106, 128-135. 
Gildea, E. F., & Man, E. B. Methods of es- 
timating capacity for recovery in patients with 
manic-depressive and schizophrenia psychoses. 
Amer. J. Psychiat., 1943, 99, 496-506. 

Gold, L., & Chiarello, C. J. Prognostic value of 
clinical findings in cases treated with electric 
shock. J. nerv. ment. Dis., 1944, 100, 577-583. 
Gottlieb, J. S., & Huston, P. E. Treatment of 
schizophrenia: followup results in cases of in- 
sulin shock therapy and in control cases. Arch. 
Neurol. Psychiat., 1943, 49, 266-271. 
Gralnick, A. Seven-year survey of insulin 
treatment in schizophrenia. Amer. J. Psychiat., 
1945, 101, 449-452. 

Herzberg, F. I. Prognostic variaties for elec- 
tro-shock therapy. Unpublished doctor’s thesis, 
Univer. of Pittsburgh, 1950. 

Hohman, L. B. A review of 144 cases of af- 
fective disorders after seven years. Amer. J. 
Psychiat., 1937, 94, 303-308. 

Horwitz, W. A., & Kleimann, C. Survey of 
cases of dementia praecox discharged from 
Psychiatric Institute and Clinic Hospital. Psy- 
chiat. Ouart., 1936, 10, 72-86. 

Hunt, J. McV., & Cofer, C. N. Psychological 
deficit. In J. McV. Hunt (Ed.), Personality and 
the behavior disorders. New York: Ronald, 
1944. Pp. 971-1032. 

Hunt, R. C., & Appel, K. E. Prognosis in psy- 
choses lying midway between schizophrenia 
and manic-depressive psychoses. Amer. J. Psy- 
chiat., 1936, 93, 313-339. 

Jacob, J. S. Prediction of outcome-on-furlough 
of dementia praecox patients. Genet. Psychol. 
Monogr., 1940, 22, 425-453. 

Kant, O. The evaluation of prognostic criteria 
in schizophrenia. J. nerv. ment. Dis., 1944, 
100, 598-605. 


Karagulla, S. Evaluation of electric convul- 
sion therapy as compared with conservative 
methods of treatment in depressive states. J. 
ment. Sci., 1950, 96, 1060-1091. 


Involutional melancholia and 
Amer. J. Psychiat., 1949, 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


w 
“i 


38. 


39. 


Prognostic Criteria in Case Histories 


Kitching, E. H. The prognosis of mental dis- 
orders. Clin. J., 1948, 77, 41-51. 

Kline, N. S., & Tenney, A. M. Constitutional 
factors in the prognosis of schizophrenia. 
Amer. J. Psychiat., 1950, 107, 434—441. 
Lewis, A. J. Prognosis in schizophrenia. Lan- 
cet, 1935, 1, 339-341. 

Lewis, N. D. C., & Hubbard, L. D. The 


mechanisms and prognostic aspects of the 
manic-depressive-schizophrenic combinations. 
Manic-Depressive psychosis. Proc. Ass. Res. 


nerv. ment. Dis., 1931, 11, 539-608. 
Malamud, W., & Render, N. Course of prog- 
nosis in schizophrenia. Amer. J. Psychiat., 
1939, 95, 1039-1057. 

Malamud, W., Sands, S. L., Malamud, Irene 
T., & Powers, P. J. P. The involutional psy- 
choses: a socio-psychiatric follow-up 
Amer. J. Psychiat., 1949, 105, 567-572. 
Malzberg, B. Marital status in relation to the 
prevalence of mental disease. Psychiat. Quart., 
1936, 10, 243-261. 

Meduna, L. J. Oneirophrenia. Urbana: Univer. 
of Illinois Press, 1950. 

Neumann, E., & Finkenbrink, F. (Statistical 
studies in spontaneous remissions in schizo- 
phrenia.) Allg. Zchr. f. Psychiat., 1939, 111, 
17-46. 

Osborne, R. L. Prognosis in schizophrenia. J. 
Amer. med. Ass., 1940, 114, 846-847. 

Palmer, D. M., Riepenhoff, J. P., & Hanahan, 
P. W. Insulin shock therapy, a statistical sur- 
vey of 393 cases. Amer. J. Psychiat., 1950, 106, 
918-926. 

Pascal, G. R., & Zeaman, J. B. Measurement 
of some effects of electro-convulsive therapy on 
the individual patient. J. abnorm. soc. Psy- 
chol., 1951, 46, 104-115. 

Paster, S., & Hotzman, S, C. A study on one 
thousand psychotic veterans treated with in- 
sulin and electric shock. Amer. J. Psychiat., 
1949, 105, 811-814. 

Rapaport, D. Diagnostic psychological testing. 
Chicago: Year Book Publishers, 1945. 
Rennie, T. A. C. Prognosis in manic-depres- 
sive and schizophrenic conditions following 
shock treatment. Psychiat. Quart., 1943, 17, 
642-654. 
Ross, T. A. 


neuroses. 


study. 


An enquiry into prognosis in the 
Cambridge: University Press, 1936. 


40. 


41. 


43. 


44. 


46. 


47. 


48. 


49. 


51. 


52. 


53. 


171 


Rupp, C., & Fletcher, E. K. Five to ten years 
follow-up study of 641 schizophrenic 
Amer. J. Psychiat., 1940, 96, 877-888. 
Sheldon, W. H., Stevens, S. S., & Tucker, W. 
B. The varieties of human physique. 
York: Harper, 1940. 

Stallworthy, K. R. Prognosis in schizophrenia. 
New Zealand Med. J., 1944, 43, 230-232. 
Strecker, E. A., Appel, K. E., Eyman, E. V., 
Farr, C. B., LaMar, N. C., Palmer, H. D., & 
Smith, L. H. 


sive 


cases. 


New 


The prognosis in manic-depres- 
psychosis. Manic-depressive psychosis 
Proc. Ass. Res. nerv. ment. Dis., 1931, 11, 471- 
538. 

Strecker, E. A., & Willey, G. F. 
of reversible “dementia praecox” reactions. 
Amer. J. Psychiat., 1924, 80, 592-677. 
Strecker, E. A., & Willey, G. F. Prognosis in 
schizophrenia. Schizophrenia. Proc. Ass. Res. 
nerv. ment. Dis., 1928, 5, 403-431. 


Sullivan, H. S. The relation of onset to out- 
come in schizophrenia. Schizophrenia. Proc. 
Ass. Res. nerv. ment. Dis., 1931, 10, 111-118. 
Swensen, C. H. Studies of prognostic criteria 
in the case histories of hospitalized mental pa- 
tients. IV. Type of onset. Unpublished doc- 
tor’s dissertation, Univer. of Pittsburgh 
Taylor, S. D., & Von Salzen, C. F. Prognosis 
in dementia praecox. Psychiat. Quart., 1938, 
12, 576-582. 

Wall, C. Some prognostic criteria for response 
of schizophrenic patients to insulin treatment. 
Amer. J. Psychiat., 1941, 97, 1397-1402. 
Wilcox, P. H. Shock therapy. In E. A. Spiegel! 
(Ed.), Progress in neurology and psychiatry. 
New York: Grune & Stratton, 1949. Pp. 499- 
528. 

Williams, R. R., & Potter, H. W. The signifi- 
cance of certain symptoms in the prognosis of 
dementia praecox. State Quart., 1921, 6, 361- 
380. 

Wittman, Phyllis. A scale for measuring prog- 
nosis in schizophrenic patients. Elgin St. Hosp. 
Papers., 1941, 4, 20-33. 

Wittman, Phyllis, & Steinberg, D. L. Follow- 
up on objective evaluation of prognosis in 
dementia praecox and manic-depressive psy- 
choses. Elgin St. Hosp. Papers., 1944, 5, 216- 


227. 


An analysis 








Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


An Evaluation of Dymond’s Test of Insight 
and Empathy 


Henry Clay Lindgren 


San Francisco State College 


and Jacqueline Robinson * 
San Francisco, California 


Most persons who have to do with selection, 
training, or supervision agree that the crucial 
factors which differentiate between good and 
poor workers have to do with personality 
rather than skill and information. This is 
particularly true in the case of workers in such 
fields as teaching, social welfare, nursing, and 
counseling and guidance. However, we have 
had relatively little success so far in devising 
methods and devices which can measure the 
dimensions of personality that are essential 
to professional success with any great degree 
of validity and reliability. 

This study is the report of an attempt to 
develop a method of measuring insight and 
empathy—two qualities or abilities essential 
to the successful conduct of any profession con- 
cerned primarily with interpersonal relations. 


The test or method which appeared promis- 
ing is one devised by Rosalind Dymond [3, 4]. 
Dymond employed a cross-questionnaire type 
of test administered to a group of 80 people, 
subdivided into smaller groups of 4 each. After 
a brief acquaintanceship period, each member 
of the subgroup was asked to compare him- 
self on a five-point scale with each other mem- 
ber of the group with regard to six different 
qualities: superiority, self-assurance, leadership, 
friendliness, sympathy, and tension. He was 
asked to repeat these six comparisons in a 
slightly different way in each of the four sec- 
tions of each questionnaire. In the first sec- 
tion, he was asked to compare himself to an- 
other person in the subgroup, whom we shall 


1The authors wish to acknowledge the help of 
graduate students at San Francisco State College 
who aided in assembling the data on which this 
study is based: Charles Smith, Jack Nakashima, 
Donald Kase, Donald Crosby, and Richard Gilberg. 


call the referent; in the second section, he was 
asked to compare the referent to himself; in 
the third section he was to guess what the re- 
ferent would say (“What will he say he is 
compared to you?”’) ; and in the fourth section 
he was asked to guess the reverse of this ques- 
tion (‘“What will he say you are, compared 
to him?”’). After each person in the sub- 
groups had rated or compared himself with 
each other person in his subgroup, the sub- 
groups were rearranged and re-formed in such 
a way that the new subgroups consisted of in- 
dividuals who had not been used as co-refer- 
ents. The process of comparison and rating 
was repeated. In all, the process was per- 
formed three times. In this way, each subject 
became a member of three different groups and 
interacted with nine different referents. 

The “empathy” score for each subject was 
obtained by comparing his predictions of what 
referents would say (Sections III and IV of 
the questionnaire) with the actual statements 
of the referents (Sections I and II of the re- 
ferents’ questionnaires). The empathy score 
was the total number of scale-points the pre- 
dictions deviated from the actual responses. 
Thus, the higher the score (i.e., the greater 
the error), the less the “empathy.” 

If the empathy is used in the sense employed 
by Dymond, i.e., the “ ‘faculty’ of being able 
to see things from the other person’s point of 
view,” [4, p. 344] this method possesses a cer- 
tain face validity, in that the individual is 
asked to empathize, to predict the thoughts or 
feelings of another, and a check is made to see 
whether he has empathized correctly. 

The “insight” score was obtained by compar- 
ing a subject’s judgments about himself (Sec- 
tion 1) to the judgments which others made 


172 


Evaluation of Dymond’s Test of Insight and Empathy 


about him (Section I1). The concept of in- 
sight employed by Dymond was Allport’s cri- 
terion: the relation of what an individual 
thinks he has to what others think he has [1]. 
Although many would undoubtedly object to 
the narrowness of this definition of insight, it 
is to be granted that Dymond’s method of 
measurement of this factor also has face validi- 
ty, provided this definition is kept in mind. 


A Revision of the Dymond Test 


The first step to be undertaken in determin- 
ing whether this test could be used as a selec- 
tion device was to try it out in a large-scale ad- 
ministration. Dymond used a complete ques- 
tionnaire of some nine pages for each individu- 
al and each administration. This meant nine 
questionnaires or 81 pages for each individual. 
It was evident that the bulk of paper would be 
large and the scoring cumbersome if her meth- 
od were employed without change. Her meth- 
od was therefore simplified by constructing an 
answer sheet containing the statements as or- 
iginally made on the questionnaire, but with 
the five-point scale rendered as “4 BC D E.” 
Thus each individual used the basic question- 
naire much like a test booklet and recorded his 
replies on the answer sheet. 

This format was tried with two groups of 
freshman psychology students with a total en- 
rollment of 60. The mean and standard devia- 
tion of 71.3 and 15.9 of the empathy scores 
compare favorably with the mean of 73.2 and 
the standard deviation of 15.8 reported by Dy- 
mond. 

However, students taking the test were 
quite critical of the items relating to “superi- 
ority,” which they stated had no meaning for 
them, unless it was one of snobbery. Inasmuch 
as the “superiority” statement was the first 
item in each section, it was felt by the experi- 
menter that it might possibly have an adverse 
effect on the cooperation of subjects. Con- 
sequently, the questionnaire and answer sheet 
were revised so that the item relating to “su- 
periority” was changed to one of “intelligence” 
and assigned a medial position in the four sec- 
tions. 

The second revised form of the question- 
naire was administered to two groups of fresh- 
men and one group of upper-division students, 
125 in all. The mean for this group was 


173 


71.23, and the standard deviation was 17.55. 
Here again, figures compare very favorably 
with the scores originally obtained by Dymond. 


Reliability and Validity 


A number of checks were made on the 
validity and reliability of the test. Dymond 
had reported a correlation of .63 between the 
first half and the second half of the empathy 
scores on each sheet, and a correlation of .82 
between scores on items 1, 3, and 5, and scores 
on items 2, 4, and 6 [4]. She also reported 
a test-retest reliability of .60 for an earlier 
form of her test [3]. 

The reliability of the present test was com- 
puted by the split-half method, i.e., by dividing 
each person’s answer sheets into two groups on 
an odd-even basis and correlating the total 
scores for each of the two groups. The nine 
sheets made two groups of five each, the even- 
numbered sheets being supplemented by one 
sheet selected at random from the “odd” 
group. The correlations obtained were .69 for 
the empathy portion of test and .73 for the 
insight portion (N = 87). 

Dymond had also reported that the Wech- 
sler full scale IQ of “good empathizers” was 
132.1 and that of “poor empathizers” was 
126.4, a difference which she described as not 
significant [4]. Inasmuch as she did not re- 
port correlations between Wechsler and em- 
pathy scores, it was difficult to make compari- 
sons. Nevertheless, correlations were computed 
for 75 students between scores received on the 
revised form of the Dymond test and the 
American Council on Education’s Psychologi- 
cal Test for College Freshmen, with the fol- 
Jowing results: correlation between “empathy” 
and the ACE (L), .14; between “empathy” 
and ACE (Q), .17; between “empathy” and 
ACE (Total), .14; and between “insight” and 
ACE (Total), .02. These results confirm 
Dymond’s interpretations. 

An investigation was also made of the re- 
lationship between the individual’s ratinz of 
his own empathy and the score he received on 
the empathy portion of the revised Dymond 
test. The correlation obtained for 48 individu- 
als was —.04. 

Another investigation was made of 45 stu- 
dents who had taken the Minnesota Multi- 
phasic Personality Inventory as well as the re- 








174 


vised form of the Dymond test. Although the 
relationship between empathy scores and scores 
received on the various MMPI scales was not 
thoroughly investigated, it was noted that a 
group consisting of the “poorest empathizers””’ 
received consistently higher (i.e., more “mal- 
adjusted’’) scores on the MMPI, than did a 
group consisting of the “best empathizers.” 
These results are presented in Table 1. They 
tend to confirm the general nature of Dy- 
mond’s findings that persons with “poor” em- 
pathy are not as well adjusted, according to 
personality tests, as are persons with “good” 
empathy [4]. 














Table 1 
MMPI Scores* of “Good” and “Poor” Empathizers 
MMPI “Good” “Poor” 
Scales Empathizers Empathizers 
rn RR loonie 45 48 
| TS 52 65 
OE kcscconsnchindiitebdainehieiuns 53 48 
| | Ra ee 46 51 
, ho ee 56 60 
I, stucintoveupnaigiianinnialed 52 56 
hides cisetedindiaet echt 57 67 
ER Es a 62 74 
AI conccconatbaebladickemcdeedbes 54 64 
| Bete ee RN 4 57 67 
pees eee ae Sc Ee 60 66 
OO at aaa 7 5 
*Mean standard scores, including ‘K-correction”’ 


where appropriate. 
+ Out of a total of 45 freshmen. 


The Effect of Cultural Stereotypes 


In scoring the revised form of the Dymond 
test, the authors of this study became aware 
of the fact that responses to each item did not 
distribute themselves evenly with regard to the 
five points on each scale. There was a definite 
tendency for one of the positions in the scale 
to be preferred with regard to each item. For 
example, with regard to the item on friendli- 
ness (“How friendly or unfriendly do you 
think you are to him or her?’’), there was a 
decided tendency to check the fourth item 
(“fairly friendly”). This observation gave rise 
to the hypothesis that perhaps respondents 
were actually not empathizing at all, but were, 
in eflect, responding to a cultural norm or 
stereotype which says that people ought to 


Henrv C. Lindgren and Jacqueline Robinson 


maintain attitudes which are “fairly friendly.” 
To carry this hypothesis further, it is implied 
that individuals in our middle-class culture 
would be under some pressure either to behave 
in ways which are “fairly friendly” or to be- 
lieve that they are “fairly friendly,” i.e., to con- 
ceive of themselves as “fairly friendly” even if 
this were not actually characteristic of their be- 
havior. “Friendliness” occupies a fairly central 
position in the values of our culture; we have 
a high stake in conforming to the cultural 
norms ; and each of the personal qualities which 
form the bases of the items in the question- 
naire likewise occupies a central position with 
regard to the norms and values of our culture. 

When this hypothesis was checked with re- 
gard to responses indicated on the answer 
sheets, a rather decided pattern emerged. Ac- 
cording to the patterning of the responses on 
the first section, we are likely to think of our- 
selves as “fairly self-assured” in our relations 
with others; we are “about half and half 
leaders and followers” ; we are “fairly friend- 
ly’; we are “neither more nor less intelligent”’ ; 
we are “fairly sympathetic” ; we are “fairly re- 
laxed.” Furthermore, a similar sort of pattern- 
ing persisted throughout the remaining three 
sections of the questionnaire. 

On the basis of this analysis, a male key and 
a female key were constructed based on the re- 
sponses of 100 students, equally divided as to 
sex.” Tests were then rescored in two different 
ways. They were scored by the keys: 7. Ac- 
cording to the original scoring patterns, i.e., 
an “empathy” score was obtained by measur- 
ing the amount of deviation between Sections 
III and IV of the subject’s questionnaires and 
Sections I and II of the key, and “insight” 
was measured by comparing Section I of the 
subject’s questionnaires with Section II of the 
key. 2. The second method of scoring was to 
measure the deviation between each section of 
the subject’s questionnaires and the corres- 
ponding section of the key. This method of 


2 To save printing costs, the questionnaire, answer 
sheet, and keys have been deposited with the Ameri- 
can Documentation Institute. Order Document No. 
3916 from the American Documentation Institute, 
Auxiliary Publications, Photoduplication Service, 
c/o Library of Congress, Washington 25, D. C., re- 
mitting $1.25 for microfilm (images 1 inch high 
on standard 35 mm. motion picture film) or $1.25 
for photostat readable without optical aid. 


Evaluation of Dymond's Test of Insight and Empathy 


scoring produced a score which represented the 
tendency of an individual to conform to the 
norm in his responses. 

Using the first method of scoring, the cor- 
relation between the original empathy scores 
and the ‘“‘normative’”’ empathy scores was .74. 
The correlation for insight was .60. 

Using the second method the correlation be- 
tween the original empathy scores and the total 
normative score was .56. The correlation with 
insight scores was .51. 


Discussion 


The results of this investigation indicate 
that the Dymond approach to the measure- 
ment of insight and empathy can be adapted 
to large-scale administration, but the reliability 
is somewhat low for the test to be considered 
useful as a predictive device. However, re- 
liability could probably be increased to a satis- 
factory level by increasing the number of in- 
dividuals with whom the subject rates him- 
self. On the other hand, such a procedure in- 
troduces further complications. The present 
manner of administration requires somewhat 
more than a two-hour period, or three fifty- 
minute class periods. Furthermore, it is diffi- 
cult to maintain the interest of subjects even 
for the three administrations required. On the 
other hand, if the test were actually used as 
a selection instrument, presumably the ego-in- 
volvement of participants would be sufficient 
to maintain interest even if the number of ad- 
ministrations were doubled. 

The validity of the test is in graver question. 
If it measures insight according to the defi- 
nition selected by Dymond, it probably covers 
too narrow an aspect of what psychologists 
mean by insight. What we need here is a term 
which means “the tendency to see oneself as 
seen by others,” since this is what the test 
measures. 


Further data are needed as to the extent to 
which the test measures empathy. Dymond’s 
research as well as that of the present study 
indicates that empathy as measured by this test 
is probably related to personal adjustment as 
measured by a variety of measures. However, 
what is needed is an evaluation based on more 
acceptable criteria of empathy. The content- 
free interview method of measuring empathy 


175 


developed by Claire Wright Thompson and 
Katherine Bradway suggests itself as a possible 
criterion [2]. 

The gravest question with regard to validity 
grows from the relationship demonstrated in 
this study between the “normative” scores and 
scores produced by the method developed by 
Dymond. This relationship comes close to the 
reliability of the test. In other words, the test 
correlates almost as well with the cultural 
norm as it does with itself. This raises the 
question of what an individual does who com- 
pares himself with others by the Dymond 
method. Is he responding to his awareness of 
others, or is he comparing his self concept with 
some cultural norm, or is he reacting to some 
stereotyped concept of how people ought to feel 
and act? Or is he reacting defensively because 
the nature of the questions arouses anxiety? 

Furthermore, one should not overlook the 
possibility suggested by Hastorf and Bender 
that “part of the successful prediction of an- 
other person’s responses may be due to pro- 
jection rather than empathy. . . .” [5, p. 576]. 
This hypothesis could account for the high 
correlation between empathy scores and norm- 
ative scores reported in the present study. It 
may be, therefore, that persons show up as 
“good empathizers” on this test if they are con- 
ventional folk who conform rather closely to 
cultural norms and standards and who tend to 
perceive the behavior and attitudes of others 
in the light of these same norms. In other 
words, conventional people get good scores on 
empathy tests because most of their partners 
(or referents) in the test are also conventional. 
Hence the prediction of conventional attitudes 
coincides with the existence of conventional 
attitudes. This would also mean that uncon- 
ventional people would tend to get poor em- 
pathy scores (because they would project devi- 
ating attitudes). Thus when Dymond states 
that “on the whole it is easier to predict the 
responses of a person who 1s highly empathic”’ 
[4, p. 345] as measured by this test, she may 
be reporting the discovery that the more con- 
ventional an individual is, the easier it is to 
predict his responses. 

It is quite possible that all these factors op- 
erate. In any case, further research with in- 
struments of this type will have to take into 
consideration the effect of norms and stereo- 








176 


types on the results of the test. Perhaps the 
empathy of an individual can be measured by 
scoring his questionnaires by both the norma- 
tive key and by the Dymond method. Using 
this approach, an individual who has a “good” 
(i.e., low) score on the empathy portion of 
the test, but who has a high score (showing 
marked deviation) from the normative key 
would be a person with much empathy, little 
influenced by the cultural norm. However, it 
would be difficult to demonstrate whether a 
person who receives a low score on each of 
these scorings actually lacks empathy. Perhaps 
what is needed,rather, is a test constructed on 
the same model, consisting of items which are 
less ego-involved, which do not evoke defen- 
siveness or references to cultural norms. How- 
ever, the construction of such a test would not 
be easy, as any item involving interpersonal re- 
lationships would be likely to incur some anxi- 
ety and hence some defensiveness. Neverthe- 
less, such a test would have great value as a 
selection instrument for occupations in which 
empathy is important. Thus it is hoped that 
additional research will produce an instrument 
which is valid and reliable for this purpose. 


Summary 


A minor revision of the method used by 
Rosalind Dymond in the measurement of in- 
sight and empathy was used in an attempt to 
produce an instrument which could be used 
for purposes of selection. Dymond’s findings 
were confirmed that the relationship between 
the factors measured by her test and scores 
received on intellective measures was slight. 
Also confirmed were her findings that persons 


3In a letter to the senior author, Dymond states 
that she has developed a partial-correlation method 
which partials out the stereotype from predictions, 
thus leaving a residual which may have more val- 
idity as a measure of empathy. 


Henry C. Lindgren and Jacqueline Robinson 


of high empathy have “better” scores on per- 
sonality tests than do persons of low empathy. 
However, the correlation between repeated ad- 
ministrations of the revised questionnaire in- 
dicated reliability of the method is too low for 
predictive purposes, although it probably 
would be improved if the total number of ad- 
ministrations were increased. 

However, the validity of the method is ques- 
tioned in view of the high degree of relation- 
ship between scores received on the revised test 
and those produced by scoring with a norma- 
tive key. The correlation between these scores 
is almost as high as the reliability of the test. 
This raises the question of whether the test 
measures the tendency of individuals to re- 
spond to an interpersonal situation in terms 
of cultural norms rather than empathic 
promptings. It was suggested that both fac- 
tors may operate. Yet until they can be 
measured separately, or until a form of the test 
is developed which does not evoke reference 
to such norms, the use of the present revision 
as a predictive measure of insight or empathy 
appears inadvisable. 


Received October 6, 1952. 


References 


1. Allport, G. W. Personality: A psychological in- 
terpretation. Boston: Houghton Mifflin, 1937. 

2. Bradway, Katherine, & Thompson, Claire 
Wright. A content-free method of training in- 
terviewers. J. consult, Psychol., 1950, 14, 321- 


244 
5245 


Dymond, Rosalind F. A scale for the mea- 
surement of empathic ability. J. consult. Psy- 
chol., 1949, 13, 127-133. 

4. Dymond, Rosalind F. Personality and empathy. 

J. consult. Psychol., 1950, 14, 343-350. 

5. Hastorf, A. H., & Bender, I. E. A caution re- 
specting the measurement of empathic ability. 
J. abnorm. soc. Psychol., 1952, 47, 574-576. 


we 


orig of Consulting Psychology 
ol. 17, No. 3, 1953 


The Present Status of Research on Nondirective 
Play Therapy 


Dell Lebo 


Florida State University 


To many persons nondirective play therapy 
has seemed easy to learn, pleasant to undertake, 
and gratifying in results. Much of the atten- 
tion attracted to play therapy has resulted from 
its apparent ease as well as from the concept 
that play is the natural medium of expression 
of the child. Play has come to be recognized 
as the most satisfactory way of understanding 
the nonverbal child. Because of the current 
widespread interest in child psychology, child 
training, education, and mental hygiene, recent 
years have seen a great deal of concomitant 
activity in nondirective play therapy. Much of 
the activity has resulted in emotional articles 
lauding nondirective play therapy. These arti- 
cles generally explain the efficacy of nondirec- 
tive play therapy on the basis of philosophical 
constructs arising from the growth principle 
developed from nondirective counseling with 
adults. 

Research in nondirective therapy with adults 
is sound and extensive. Research in nondirec- 
tive play therapy with children is still meager, 
unsound, and frequently of a cheerful, persua- 
sive nature. It has seemed to the present writer 
that such articles could be more correctly class- 
ified as propaganda than as research. 

The present paper is an attempt to review 
current research in nondirective play therapy. 
With one exception, studies involving more 
than one child have been reported here. The 
report giving only a single case history has been 
avoided. Also, only studies concerned with 
nondirective play therapy are reported. There 
is no doubt that play therapy has a considerable 
history. It may be said to extend back to 
Rousseau [23] who studied the play of the 
child to understand his psychology. However, 
nondirective play therapy developed from the 
work of Carl Rogers [21, 22] and his associ- 


ates. One of them, Virginia Axline [2], was 
the first successfully to apply nondirective 
methods to play therapy with children. Ax- 
line’s book, while widely read, is more sug- 
gestive than it is factual. It seems to have been 
the forerunner of much of the persuasive ma- 
terial to be found in studies of nondirective 
play therapy. 

A recent example of the continuing propa- 
gandistic tendencies in nondirective play-thera- 
py research papers is to be seen in an article 
titled: “An Experiment in Play Therapy” 
[11]. The stated purpose was to help children 
who seemed unable to adjust to the school situ- 
ation. The children were selected on the basis 
of Rorschach tests, their teacher’s impression, 
and the therapist’s observations. Two groups 
of five children each were selected for non- 
directive play therapy. The majority of child- 
ren selected had problems of sibling rivalry. 
The groups met once a week for play therapy. 
Much verbatim material from the therapy 
notes was presented to enable the reader to 
partake of the emotional flavor of the situation. 

The shortcomings of the “experiment” are 
serious. First, it lacks a clear hypothesis and 
a control group. Second, the results seem to 
indicate a lack of rigorous method. For it is 
the conclusion of the research that “through 
group work they [the children] learned that 
they were not alone in having ‘bad feelings.’ 
Other children had them too” [11, p. 180]. 
Further that “. . . their natural healthy drive 
toward maturity, which had been retarded, 
could once more assert itself” [11, p. 180]. 
Such conclusions would seem to savor more of 
a desire to support nondirective play therapy 
than they do of experimental procedure. 

W hat takes place in nondirective play thera- 
py? With the philosophy of the love of child- 


177 








178 


ren, and the idyllic purposefulness of many of 
the typical articles stripped off, nondirective 
play therapy seems to be left a rather thin 
framework. A determination of the process of 
play therapy, as contrasted with the results of 
play therapy, has been subject matter for only 
three known research studies. 


The first such study was the work of Land- 
isberg and Snyder [18]. They attempted to 
analyze what actually took place in client- 
centered play therapy by an objective approach. 
Their procedure was to study the protocols of 
three successful and one incomplete case. Each 
statement made by the counselor was cate- 
gorized as to its content. Statements made by 
the children were categorized as to content, 
emotion expressed, and activity. Although the 
children ranged in age from five to six years, 
the categories used had been developed for em- 
ployment with adult cases. 


They reported finding an increase in the 
child’s physical activity during the last three- 
fifths of therapy. The children were found to 
have released much feeling during therapy. 
About 50 per cent of their actions and state- 
ments during the first two-fifths of treatment 
were devoted to emotional release. This per- 
centage rose to 70 for the last three-fifths of 
the process. It was noticed that negative feel- 
ings particularly increased in frequency. The 
major part of the children’s feelings were di- 
rected towards others and not to themselves 
or to the counselor. No insightful statements 
were made by the children whose records were 
studied. 


Finke [15] did not use adult categories in 
analyzing children’s nondirective play therapy 
protocols, but derived her categories from an 
analysis of children’s statements. Expressions 
of feeling were emphasized as it was believed 
such expression would mirror the child’s 
changing emotional reactions resulting from 
the play therapy. 


She selected complete protocols from six 
play therapists concerning six different children 
referred for behavior problems. The children 
ranged in age from five to eleven years. The 
possibility of bias resulting from one person’s 
categorizing all the cases was avoided by hav- 
ing five students recategorize one or more in- 
terviews chosen at random. Their results 


Dell Lebo 


corresponded adequately with the original cate- 
gorization. 

It was found that different children, under- 
going therapy with different therapists, showed 
similar trends which tended to divide play 
therapy into three stages: 


1. Child is either reticent or extremely talkative. 
He explores the playroom. If he is to show aggres- 
sion at any time during therapy, a great deal of it 


will be exhibited in this stage. 

2. If aggression has been shown, it is now 
lessened. This child tests the limitations of the play- 
room. Imaginative play is frequently indulged in 
here. 

3. Most of the child’s efforts are now expended 
into attempted relationship with counselor. The 
child tries to draw the therapist into his games and 
play. 


Like Landisberg and Snyder [18], Finke 
[15] found no trends for positive statements. 
Unlike them, she found no trends for negative 
statements. The verbal characteristics of adult 
counseling sessions did not appear. Finke con- 
cluded that nondirective play therapy had its 
own characteristic pattern which was repeated 
in case after case. 

Both studies indicated that children’s atti- 
tudes changed during therapy and that the 
changes could be quantitatively reported. 
There seem to be serious limitations in both 
studies. Landisberg and Snyder used adult 
categories. Finke found fault with this. She 
indicated that differences in the age and so- 
phistication of the adults and the children 
would affect the degree and type of verbaliza- 
tion made. Consequently, she felt it was not 
justifiable to evaluate children’s comments on 
the basis of categories derived from adults. 

Finke seemed to have failed to recognize the 
possibility that just as the wide age discrep- 
ancies between adults and children might in- 
fluence the character of their verbalizations, so 
might children’s categories vary significantly 
from one level of maturity to another. 

The present writer [19] undertook a study 
of the possible relationship between chronologi- 
cal age and the types of statements made by 
children in play therapy. He used Finke’s 
[15] categories. 


Twenty children were given three play-therapy 
sessions by the same therapist in the same play- 
room. The children were reasonably equated for in- 
telligence and social adjustment. Five age stages 


Research on Nondirective Play Therapy 


were represented with two boys and two girls in 
each stage. Children were selected who were 4, 6, 
8, 10, and 12 years of age. 

Fifteen pages of verbatim style notes were select- 
ed by a table of random numbers from the 166 
pages of protocol. These 15 pages were then cate- 
gorized by three experienced play therapists. Their 
percentages of agreement were adequately similar to 
one another. All of the protocols were then an- 
alyzed by the writer. 

It was found that maturation, as represented by 
chronological age, did seem to account for some 
definite trends in the types of statements made by 
children in the play-therapy situation. 

As the children became older, they told the thera- 
pist fewer of their decisions. They spent less time 
in exploring the limitations. They made fewer at- 
tempts to draw the therapist into their play and 
they expressed more of their likes and dislikes. 


The three studies, while not strictly com- 
parable, would seem to indicate that nondirec- 
tive play therapy is an objectively measurable 
process; that children’s emotional expressions 
are altered in a discernible manner; and that 
maturation appears to be related to the type of 
expression of therapeutic change. Beyond such 
statements the studies substantiate few of the 
philosophical aspects of play therapy. 


The Successfulness of Nondirective 
Play Therapy 


The outcome of play therapy in various 
types of cases. Nondirective play therapy has 
been used in the study and treatment of such 
seemingly diverse problems as allergy, mental 
deficiency, personality problems, physically 
handicapped children, race conflicts, and read- 
ing difficulties. From the published reports 
one receives the impression that it has usually 
been either successful or incomplete. The 
children are seemingly relieved of their present- 
ing symptoms or the therapy is unavoidably in- 
terrupted in a promising but unfinished phase. 


Nondirective play therapy in the treatment 
ef allergy. Miller and Baruch [20] following 
a successful preliminary psychotherapeutic 
treatment of allergy [8] undertook to treat six 
children under eleven years of age by play 
therapy. All their subjects had classical allergic 
symptoms confirmed by positive skin reactions 
to various allegens. Prior to nondirective play 
therapy all the subjects had been unsuccessfully 
treated medically. 

They cite as a representative case a five-year- 


179 


old asthmatic boy who used attacks of asthma 
to gain contact with his mother. Whenever 
she left him, the asthma would express his hos- 
tile feelings. His asthmatic attacks cleared af- 
ter five months of play therapy. Unfortunate- 
ly, for purpose of this investigation, 16 allergic 
adults were included in the results. As a re- 
sult, it can only be said that of the 22 patients 
(including the six children), 21 showed im- 
provement while one was unchanged. 

Nondirective play therapy in the study of 
mental deficiency. Exploratory material sug- 
gestive of the emotional factors in mental de- 
ficiency is presented by Axline [5]. In a re- 
port of an examination of selected play-therapy 
protocols, evidence is offered which indicates 
marked improvement in some IQ scores after 
completing play therapy. 

The verbatim stenographic reports of 15 six- 
and-seven-year-old children referred for be- 
havior problems were studied. Each child had 
been seen individually by the same therapist 
for 8 to 20 contacts. The reports were selected 
and analyzed at some time after therapy on the 
basis of Stanford-Binet IQ ratings and the age 
of the children. The records 
grouped as follows: 


were then 


1. Children who showed no appreciable change 
in IQ scores after therapy. Pre- and _posttesting 
indicated low intelligence. 

2. Children who showed a gain in IQ scores af- 
ter therapy. Pretests were low; posttests were in- 
dicative of normal intelligence. 

3. Children with average intellig...ce both before 
and after play therapy. Children from this group 
had play therapy in a children’s home. 


In every case of those children whose IQ 
stayed low, their mother had indicated shame, 
disapproval, and rejection. It was felt that the 
children’s difficulty lay in their daily lives. 
“They were not able to communicate clearly 
to others the things that were uppermost in 
their lives” [5, p. 528]. The therapist felt that 
each of the cases in the first category was in- 
complete. However, it was impossible to finish 
therapy. 

In the case material presented, it is evident 
that both the children whose IQ’s did not im- 
prove and those whose IQ’s became normal in- 
itiated play activity. Both groups freely ex- 
pressed negative feelings and destructive play 
which was followed by outgoing and more 








180 


positive behavior. The only difference would 
seem to be that the children whose IQ’s were 
raised had completed their therapy. 

The same behavior was shown by the child- 
ren of average intelligence whose 1Q’s did not 
change. ‘This group was included to indicate 
that mental deficiency was not the cause of be- 
havior problems for these children. 

Axline [5] did not claim that nondirective 
play therapy raised the IQ of the children of 
group two. She explained the increase in IQ 
scores by saying the child was freed from 
emotional constraint and could thus more ade- 
quately express his true capacities. 

Nondirective play therapy in the treatment 
of personality disorders. While most of the 
work in play therapy has been done in the area 
of personality, there is a dearth of research ma- 
terial. This can be explained by the client- 
centered philosophy from which nondirective 
play therapy sprung. Play therapy is oriented 
around the needs of the client and not around 
the demands of research. For this reason there 
are “cases” offered to prove that play therapy 
works, but there are still few research studies 
undertaken to see how well it works. Bloom- 
berg’s [11] work, already discussed, represents 
the research aspects of proselytizing for play 
therapy. Instead of presenting one case, she 
presents ten. But she presents these ten cases 
in a manner in keeping with the philosophy of 
nondirective play therapy. That is to say, the 
experiment was not designed to stimulate re- 
search, it was designed to help children. Con- 
sequently, her material merely indicates that 
play therapy works — a fact which no one dis- 
putes. 

An experiment that deserves to be a model 
for future play therapy work is available in 
this area. Fleming and Snyder [16] en- 
deavored to determine if measurable changes 
in social and personal adjustment resulted from 
nondirective play therapy. 


They had three simple personality tests adminis- 
tered to 46 children. Seven children who ranged in 
age from eight to eleven years were selected for 
play therapy on the basis of poor results in these 
tests. After a lapse of 12 weeks, 30 of the 46 
children were available for retesting. 

Fleming and Snyder [16] found the three girls 
had improved their adjustment with a greater 
amount of positive feelings. The least amount of 
improvement for the girls was in the social area. 


Dell Lebo 


Save for one individual who fared worse, the four 
boys made no significant changes. The control 
group posttest score was the same as their pretest 
score. 

From an analysis of the group scores as well as 
from individual data, they concluded the greatest 
change for the subjects was in personal feelings 
toward the self and in daydreaming. Hence, the 
theory is offered that personal changes in adjust- 
ment must precede social change. The therapy ex- 
perience had created more positive feeling among 
the subjects but it did not cause the control group 
to like them any better. 


Since this was the first study of its kind, it 
was to be expected that certain of their findings 
should contradict some of Axline’s [2] early 
observations. Although Fleming and Snyder 
did not indicate it, the following contradictions 
of Axline’s observations were suggested : 


1. The therapist’s sex was an important factor in 
establishing rapport. They found ten-year-old boys 
would not respond well to a female therapist. Ax- 
line had said, “Nor does the sex of the therapist 
seem to be important [for successful play therapy)” 
[2, p. 65]. 

2. A housemother who was not given therapy 
prevented successful therapy with the boys’ group. 
Previously, Axline had stressed that, “It is not 
necessary for the adults to be helped in order to 
insure successful play-therapy results” [2, p. 68]. 

3. The best therapeutic results seem to be achiev- 
ed when the children in the group have the same 
degree of maladjustment. Axline had said, “Experi- 
ments in groupings indicate that there are no... 
rules to govern them: Successful groupings have 
included both sexes, siblings, and wide age rang- 
es” [3, Pp. 269-270]. In another place she noted, 
“A handicapped child can be treated in a group 
with normal children” [3, p. 27]. 


Axline’s statements would seem to warrant 
additional experimental investigation. 


Nondirective play therapy in the treatment 
of children with physical handicaps. Axline 
[2] included cases of handicapped children in 
her pioneer work. Cowen and Cruickshank 
[12, 13] undertook the only other known re- 
search study to supplement her reports. They 
set themselves the problem of determining 
whether or not nondirective group play therapy 
could be applied to physically handicapped 
children. 


They held 13 meetings \ ith five physically handi- 
capped children all of whom had at least one 
emotional problem. The children’s teachers and 
parents made an essay-type report on the child’s 


Research on Nondirective Play Therapy 


problems at the start of the program. At the last 
meeting similar reports were filled out again. 

In a verbatim account the authors recount the 
play of a hemophiliac child. This boy would pre- 
tend to cut the therapist’s fingers to cause him to 
bleed to death. The imaginary blood was collected 
in glass jars placed around the therapist. 

The investigators found three of the children 
showed considerable observed improvement in both 
the home and the school. One child made slight 
reported gains, and one showed no improvement. 
They concluded that “the nondirective play-group 
offers an ideal setting for the self-solution for a 
particular type of emotional problem; namely, 
those stemming from the specific disability of the 
physically handicapped child” [13, p. 214]. 


The investigators themselves realize their 
work has been conducted at a very gross level. 
They point out several weaknesses of the 
project. Among these weaknesses are the lack 
of quantitative material. There were no pre- 
and posttherapy tests. There was no follow- 
up study to see if the indicated gains were 
temporary or cyclical in nature. Nor was a 
control group utilized to demonstrate more 
clearly that the play situation was the critical 
factor. 


The recognition of such lacks is a healthy 
sign. It suggests an awareness of the possibility 
of improving future research work in play 
therapy by more rigorous procedure. 

Nondirective play therapy in the handling of 
race conflicts. The effectiveness of play thera- 
py for small groups of children who had diffi- 
culty adjusting to other children was the pri- 
mary purpose of an investigation by Axline, 
“Play Therapy and Race Conflict in Young 
Children” [3]. She selected four groups of 
four children, two boys and two girls, who 
were either withdrawn or aggressively anti- 
social. 


Each group met once a week for ten meetings. 
After the tenth meeting the children were mixed 
for five additional meetings. Axline [3] found that 
Negro girls were accepted by the group after the 
seventh meeting. This new acceptance was carried 
over into the classroom. In all groups, “There was 
a tendency to participate in the group meetings 
with an awareness of the rights of others” [3, pp. 
309-310]. The race problem was never an issue 
during the five mixed meetings. 


This study is more provocative than it is 
definitive. The children selected were anti- 
social and not racially bigoted. It is quite pos- 


181 


sible that the figure of an habitual intergroup 
improvement in social relations was dressed in 
the false whiskers of lessening social antago- 
nism. In that case Axline has given an old 
phenomenon a new name. 

Nondirective play therapy in the treatment 
of reading disabilities. Since reading difficulty 
is frequently associated with emotional dis- 
turbances, it is not unexpected to find several 
research articles on the effectiveness of a thera- 
peutic approach designed to relax and better 
adjust children who are retarded readers. 


Axline [1] reported a study of 50 second-graders, 
listed as poor readers by their teachers, who were 
given a reading test. The 37 who received the low- 
est scores were placed in a special class. At the 
end of the semester, three and a half months later, 
intelligence and reading tests were administered. 

There were 8 girls and 29 boys in the groups 
with Stanford-Binet IQ’s ranging from 80-148. Un- 
like most remedial reading classes, these children 
had all their school work in one room with the 
same teacher. The reading problems were consid- 
ered to be part of the whole child. “The children 
came first. The reading, writing, and arithmetic 
came secondly” [1, p. 65]. The children were given 
the opportunity for ample emotional expression. In 
accordance with the techniques of nondirective play 
therapy their feelings and attitudes were not only 
accepted but were also clarified. No remedial read- 
ing instruction per se was given. 

Axline [1] found that 21 children gained more 
than the maturationally expected 3.5 in words. In 
the case of four subjects there was a noteworthy 
difference in the first and second IQ score. One 
subject’s score was increased from 83 to 119. 


This study woald seem to indicate that non- 
directive therapeutic procedures are effective in 
building up a readiness to read in children. 

In a later and briefer report Axline [6] 
studied three problem readers. Interestingly 
enough, only two of the children were poor 
readers, while the third child read too much. 
This child used books as a substitute for 
friends. All the children were above average 
in intelligence. 

It was found that the feelings expressed in 
play brought out emotional problems that could 
easily account for the reading difficulties. Ax- 
line concluded, “Given the opportunity the 
child can and does help himself” [6, p. 161]. 

Neither of Axline’s [1, 6] reports included 
experimental controls. Bills [9], working with 
22 slow learners in the third grade utilized 








182 


three 30-day periods of study. The first period 
was a control period in which all the children 
were tested with oral reading, silent reading, 
and Stanford-Binet tests. At the end of that 
period all the children were retested. They 
were also retested at the end of the second and 
third periods. 


The second period was the therapy period. The 
four children with the largest discrepancy between 
mental age and reading age were given nondirec- 
tive play therapy. These children all had high IQ’s. 
Four other children were selected whose IQ’s were 
approximately average. 

The third period was used as a follow-up period 
in which the children were tested again. 

During the experiment, reading instruction was 
not remedial in nature and it was kept constant for 
all members of the class. Thus, a single group was 
compared with itself for three 30-day periods. Each 
child was his own control as the three periods were 
comparable in regard to reading experiences. 

The therapy group made a significantly greater 
gain in the therapy period than it did in the control 
period. The gains of the therapy group during both 
the second and third periods of the study were sig- 
nificantly greater than the gains during the first 
period of study. 

The gains in reading ability appeared immedi- 
ately after therapy for some children and after a 
short period following therapy for others. The 
gains were found to be present six weeks (30 
school days) after therapy had ended. 

Three judges agreed that five of the children had 
gained in emotional adjustment following play 
therapy. However, the design of the study did not 
permit conclusions as to the effect of maladjust- 
ment on children’s reading ability. 


To answer the question as to whether im- 
proved reading ability was due to improved 
personal adjustment Bills [10] conducted play 
therapy with well-adjusted readers. The de- 
sign of the second study was similar to that 
of the first save that children were now select- 
ed for good adjustment by projective and ob- 
jective personality tests. 


He found reading gains were not signifi- 
cantly greater during the therapy period. So, 
it would appear that nondirective play therapy 
may improve reading in those children where 
emotional adjustment exists with the retarda- 
tion. Consequently, play therapy is not neces- 
sarily the method of choice for all retarded 
readers as the reports of Axline [1, 6] might 
suggest. 


Follow-up studies of nondirective play thera- 


Dell Lebo 


py. While it seems to have been demonstrated 
that play therapy is productive of personality 
improvement, it has not been shown whether 
the effects of therapy are permanent or tempo- 
rary. Consequently, the value of follow-up 
studies cannot be denied. 

Bills [9] found improved reading ability 
present six weeks after therapy. Axline (4) 
reports on an interesting study of a boy whose 
IQ was 65; upon retest it was 68. Six months 
after play therapy his IQ was 96. A year later 
it had gone up to 105. Part of this gain may 
be ascribed to test familiarization. However, 
the lasting effects of play therapy are again 
suggested. 

In a long-range follow-up, Axline (7) se- 
lected 30 successful play-therapy case records. 
Of these, 22 subjects were available for follow- 
up study. Nineteen of the subjects were still 
successfully adjusted a year later, two were 
successfully adjusted three years later, and one 
five years after the original contacts. A follow- 
up of 24 of the 37 children used in previous 
research [1] was made five years later. Of 
this group originally designated as poor readers 
five were honor-roll students and four others 
had reading skills adequate for their grade 
placement. 

The effects of play therapy then would seem 
to be lasting, particularly in the area of per- 
sonality adjustment. 


Summary and Critique 


The principles and methods of nondirective 
play therapy are frequently presented as 
though tHey were firmly established. The as- 
sured manner of writing of many of the au- 
thors and the large-scale possibilities held be- 
fore the reader, tend to make one believe that, 
at long last, “the way” has been found. Actu- 
ally, this is not so. Indeed, it may not be the 
specific procedures of play therapy, per se, that 
effect the rather remarkable personality 
changes. The children may be benefiting from 
having someone constantly and consistently in- 
terested in their welfare. "Those with experi- 
ence in hospitals and institutions involving the 
mentally ill have noticed the unusually high 
percentage of cures attending any new treat- 
ment. They have reported that it is not the 
treatment method that effects the improve- 
ment, rather it is the increased interest taken 





Research on Nondirective Play Therapy 


in the patient. So, too, may it be with non- 
directive play therapy. 

Axline [2] presented no experimental evi- 
dence to prove the worth of play therapy. A 
search of the literature reveals fewer than 
twenty published articles on the therapeutic 
uses of play therapy. To be admitted to the 
ranks of approved therapeutic methods non- 
directive play therapy needs more than en- 
thusiasm, belief, and the shibboleth, “It works, 
if you only try it.” 

The greatest weakness of nondirective play 
therapy lies in this impetuous overlooking of 
the real need for a foundation in research. The 
most pressing need is for the employment of 
controls in play therapy. The personal adjust- 
ment of two equated groups should be assessed 
before therapy. The children in the group 
should then be randomly assigned to experi- 
mental and control groups. Upon termination 
of therapy all the children should be retested 
to determine the quantitative effectiveness of 
the play therapy technique in “helping child- 
ren attain maturity.” 

Nondirective play therapy, while promising 
when evaluated subjectively, has been seen to 
have rather serious methodological jacks. One 
cannot concur with Kanner [17] that play 
work with children, while still in its beginning 
stages, “has come to stay,” until play therapy 
has been established by objective means. In the 
long run, nondirective play therapy should 
stand or fall on the results of experimental 
studies investigating its effectiveness in relation 
to other procedures. 


Received October 2, 1952. 


References 
1. Axline, Virginia. Nondirective therapy for 
poor readers. J. consult. Psychol., 1947, 11, 
61-69. 


2. Axline, Virginia. Play therapy. Boston: Hough- 
ton Mifflin, 1947. 

3. Axline, Virginia. Play therapy and race con- 
flict in young children. J. abnorm. soc. Psychol., 
1948, 43, 300-310. 

4. Axline, Virginia. Some observations of play 
therapy. J. consult. Psychol., 1948, 12, 209-216. 

5. Axline, Virginia. Mental deficiency—symptom 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


23. 


183 


or disease? J. consult. Psychol., 1949, 13, 313- 
327. 

Axline, Virginia. Play therapy—a 
understanding and helping reading problems. 
Childh. Educ., 1949, 26, 156-161. 

Axline, Virginia. Play therapy experiences as 
described by child participants. J. consult. Psy- 
chol., 1950, 14, 53-63. 

Baruch, Dorothy W., & Miller, H. Group and 
individual psychotherapy as an adjustment in 
the treatment of allergy. J. consult. Psychol, 
1946, 10, 281-284. 

Bills, R. E. Nondirective play therapy with 
retarded readers. J. consult. Psychol., 1950, 14, 
140-149. 

Bills, R. E. Play therapy with well-adjusted 
readers. J. consult. Psychol., 1950, 14, 246- 
249. 

Bloomberg, C. M. An experiment in play 
therapy. Childh. Educ., 1948, 25, 177-180. 
Cowen, E. L., & Cruickshank, W. M. 
therapy with physically handicapped children: 
II. Evaluation. J. educ. Psychol., 1948, 39, 
281-297. 

Cruickshank, W.M., & Cowen, E. L. 
therapy with physically handicapped children: 
I. Report of study. J. educ. Psychol., 1948, 39, 
193-215. 

Dorfman, Elaine. Play therapy. In C. R. 
Rogers, Client-centered therapy. Boston: 
Houghton Mifflin, 1951. Pp. 235-277. 

Finke, Helene. Changes in the expression of 
emotionalized attitudes in six cases of play 
therapy. Unpublished master’s thesis, Univer. 
of Chicago, 1947. 

Fleming, Louise, & Snyder, W. U. Social and 
personal changes following nondirective group 
play therapy. Amer. J. Orthopsychiat., 1947, 
17, 101-116. 

Kanner, L.. Play investigation and play treat- 
ment of children’s behavior disorders. J. Pedi- 
at., 1940, 17, 533-546. 

Landisberg, Selma, & Snyder, W. U. Non- 
directive play therapy. J. clin. Psychol., 1946, 
2, 203-214. 

Lebo, D. The relationship of response cate- 
gories in play therapy to chronological age. 
Child Psychiat., 1952, 2, 330-336. 

Miller, H., & Baruch, Dorothy W. Psycho- 
logical dynamics in allergic patients as shown 
in group and individual psychotherapy. J. 
consult, Psychol., 1948, 12, 111-115. 

Rogers, C. R. Counseling and psychotherapy. 
Boston: Houghton Mifflin, 1942. 

Rogers, C. R. Client-centered therapy. Boston: 
Houghton Mifflin, 1951. 
Rousseau, J-J. Emile. 
1925. 


way of 


Group 


Group 


New York: Dutton, 








Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


The Relative Effectiveness of Larger Units Used 
in Interview Analysis 


John E. Muthard 


Ohio State University? 


This study investigated the relative effective- 
ness for studying client and counselor inter- 
action of several larger units used in the analy- 
sis of transcribed interviews. This basic meth- 
odological question needs to be resolved so that 
future interview research can use the optimum 
unit, and so that findings from different studies 
can be compared and integrated. In addition, 
the study was concerned with the extent to 
which the units studied enable one to deter- 
mine whether the different roles assumed by 
the client and counselor during an interview 
affect their respective behavior. 

Protocol research has used a wide range of 
units to group client and counselor statements 
for the purpose of analysis. Early investi- 
gators in this area were principally concerned 
with studying single speech units, e.g., single 
remark and idea units. More recently, re- 
searchers have become more concerned with 
client and counselor behaviors in larger seg- 
ments of the interview. The larger units used 
have been: the whole interview, discussion 
topics, and fractions of the interview or inter- 
view series [4, 5]. Larger units have several 
advantages over single speeches. They (a) 
have been found to be more reliably rated, (5) 
permit the study of meaningful structures in 
the interview which would otherwise be ig- 
nored, and (c) permit the study of delayed 
effects of both client and counselor responses 
on the progress of the interview. Comparing 
and integrating the findings arising from such 
a multiplicity of units is very difficult. 


Procedures 
Interview sample. To permit the widest 


1Supported in part from funds granted to the 
Ohio State University by the Research Foundation 
for aid in fundamental research. 

2 Now at Emory University. 


184 


possible generalization from this study, inter- 
views were selected which represent not only 
different counselors, different clients, and dif- 
ferent type problems, but also different institu- 
tions. Heretofore, each researcher has only 
used protocols obtained in his own institution. 
As may be surmised, different counseling 
centers attract different types of clients and 
problems and also often vary in the general 
counseling approach used. Thirty-six interviews 
were selected from a pool of 267 verbatim 
typescripts prepared from electrically recorded 
interviews at four universities: Chicago, Min- 
nesota, Missouri, and Ohio State. The final 
sample included 12 client-counselor relation- 
ships. There were three sets of three consecu- 
tive interviews from each of the institutions. 
Wherever possible the cases selected included 
all the interviews in the series of contacts, but 
when this was not feasible—because of the 
length of the case—three consecutive inter- 
views were selected from the case. 

Units of the counseling interview studied 
The larger units studied here include the dis- 
cussion topic, fraction, and problem area. It 
was thought that the whole interview merited 
separate study since it has been found to have 
a definite pattern of organization. 

The discussion topic represents all the con- 
versation of both client and counselor about 
the same topic or subject. The division points 
for this unit were determined by the pooled 
judgments of three independent raters. After 
marking the division of the interviews inde- 
pendently, the judges met in conference to 
reconcile any differences. The character and 
merit of this unit is discussed at length by 
Robinson [3]. For a sample of nine inter- 
views, two or all three judges agreed exactly 
or within the same remark for 83 per cent of 
the discussion-topic division points. 


Effectiveness of Larger Units in Interview Analysis 


A problem-area unit consists of all the con- 
tiguous statements dealing with the same kind 
of problem. The initial step in determining 
these units is the same as for discussion topics. 
But since several successive topics may be as- 
pects of the same type of problem, it is next 
necessary to group together those topics in the 
same class. The unit classification categories 
suggested by Robinson [4] were used, i.e., ad- 
justment, skill, and special categories, to which 
was added a new class, test interpretation. 
Units in which the counselor told the client 
what the test meant and evoked little client 
participation were placed in this category. In 
the sample of interviews used to examine the 
reliability of demarcating this unit, two or all 
three judges agreed within the same remark 
85 per cent of the time. 

As a unit, the fraction comprises a definite 
percentage of the total remarks within a series 
of counseling interviews. For this research, 
quintiles of each series was the unit used. Some 
investigations using this unit are described by 
Raskin [2]. The reliability of determining its 
bounds is obviously quite high. 

Dimensions of the counseling process used. 
To compare the three types of units studied, 
the behavior within each unit had to be con- 
sidered. This was done by selecting dimensions 
which met the following criteria: reliability 
in rating, representativeness of the dimensions 
now used, and frequency of use in research. 
The concepts selected for use in this investi- 
gation were two counselor dimensions: amount 
of lead and counselor assumption of responsi- 
bility, and two client dimensions: client as- 
sumption of responsibility and working rela- 
tionship [2]. Amount of lead refers to the 
degree vvhich a counselor’s remark seems to be 
ahead of the content of the client’s last remark 
or brings pressure upon the client to adopt new 
ideas. Counselor assumption of responsibility 
is concerned with the amount of responsibility 
which the counselor assumes, which he per- 
mits the client to assume, or which he forces 
the client to assume. Working relationship re- 
fers to the degree that there is a mutual respect 
between the participants and the degree the 
client feels free to present his problems. In 
rating client assumption of responsibility, the 
judge considers the extent to which the client, 
in light of what the counselor is permitting 


185 


him to do, takes an active role in clarifying his 
problem and participates in making plans tor 
dealing with it. 

Judging procedures. Persons participating 
in the ratings for this research were advanced 
graduate students in the student personnel pro- 
gram at the Ohio State University or had 
equivalent background or experience. They 
were trained in making whatever judgments 
they participated in. Three judges demarcated 
the units and rated the three global dimen- 
sions. Only two raters judged amount of lead, 
since this remark-by-remark rating can be done 
with high reliability. Precautions were taken 
to avoid introducing systematic variations in 
rating which might favor any of the three units 
studied. 

Hypotheses. Four hypotheses were proposed 
as means of evaluating these units. “wo of 
these dealt with reliability. They were: Hy- 
pothesis 1: There is a difference between the 
interrater reliability coefficients obtained for 
the ratings of the three types of units. Hy- 
pothesis 2: The units differ in the degree 
which the first half of each unit correlates with 
the second half. It was further predicted, in 
both instances, that ratings for problem-area 
units would be most reliable and those for 
quintile units would be least reliable. 

The twe other hypotheses considered a unit’s 
capacity for distinguishing useful segments of 
the interview. That is, they examined the ex- 
tent to which the units studied make varia- 
tions within the interview discernible. Hy- 
pothesis 3 was: There is a difference between 
the correlations obtained for ratings of two 
halves of a unit and those obtained for ratings 
of the last half of a unit and the contiguous 
half of the next unit. In the case of discus- 
sion-topic anc problem-area units it was pre- 
dicted that ratings of within-unit halves would 
correlate to a greater degree than those of be- 
tween-unit halves. Fractions were predicted as 
not showing such a difference. Hypothesis 4 
was: Observable shifts in client and counselor 
behavior from each unit to its next adjacent 
unit differ when the difference scores* of the 
three units under study are compared. It was 
predicted that shifts would be greatest in the 
problem-area unit and least in fractional units. 


8 The numerical difference between mean ratings 
of adjacent units without regard to sign. 








186 


In each case the hypothesis was stated in the 
null form and tested. 


Results 


The reliability coefficients for the combined 
ratings indicated that the variables could be 
rated consistently enough to permit their use 
in research. For the three different-type units, 
the median reliability coefficient for the com- 
bined ratings on amount of lead was .93. 
Similarly, the median for counselor assump- 
tion of responsibility was .90, for client as- 
sumption of responsibility .72, and working re- 
lationship .71. Table 1 presents the complete 
reliability data for combined ratings. 


Table 1 
Reliability Coefficients for Combined Ratings 
(Spearman-Brown prophecy formula used to 
take into account combined ratings) 











Discussion Problem 
Dimension topic area Fractions 
Amount of lead .93 .93 .96 
Counselor .90 86 94 


assumption of 
responsibility 

Client 72 71 .73 
assumption of 
responsibility 

Working 71 69 77 
relationship 





In terms of reliability, no one of the three 
units was found more advantageous than an- 
other. In only one of the twelve comparisons 
of correlations obtained from combined ratings 
was the obtained difference significant. In this 
instance, fractions were rated more reliably 
than problem-area units on the dimension 
counselor assumption of responsibility. Since 
such a difference might occur by chance dur- 
ing the course of 12 comparisons, the findings 
were interpreted as not providing sufficient 
evidence to reject the null hypothesis. 

Neither could the hypothesis that the units 
do not differ in the degree which the first half 
of each unit correlates with the second half 
be rejected. The largest critical ratio obtained 
was .96 which failed to meet the fiducial 
limits set. Table 2 presents the correlation co- 
efficients obtained in testing this hypothesis. 

Although these findings concerning reliabili- 
ty do not support the alternatives proposed, 


John E. Muthard 


Table 2 


Correlation Coefficients for Mean Ratings 
of Within-Unit Halves 











Dimensions 
Amount Working 
Type Unit of Lead Relationship 
Problem area .79 .90 
Discussion topic 75 89 
Fractions 72 86 





they are amenable to explanation. Since the 36 
interviews were divided into 60 fractional, 106 
problem-area, and 174 discussion-topic units, 
it is apparent that fractions of the interview 
were generally three times as long as discussion 
topics and almost twice as long as problem- 
area units. Thus, the reliability for rating 
fractions was enhanced, since in rating or 
measurement, the larger the sample, the higher 
the reliability usually is for judging. 

Another important criterion of the effective- 
ness of an interview unit for studying client 
and counselor behavior is the degree with 
which it makes discernible variation from one 
unit to another within an interview or series 
of interviews, that is, the extent to which it 
enables one to observe the variety of behaviors 
incorporated within an interview. 

Hypothesis 3 can be rejected at the .001 
level of confidence for both problem-area and 
discussion-topic units on both of the variables 
tested. In the comparisons involving these two 
units, the correlations for within-unit halves 
were always significantly greater than those 
for between-unit halves. However, there was 
not a significant difference for the fractional 
units studied. These findings are shown in 
Tables 3 and 4. 

A test of the sensitivity to variations in client 
and counselor behavior of the three types of 


Table 3 


Significance of the Differences between Correlations 
of Amount of Lead Ratings for Within-Unit 
Halves and Between-Unit Halves 








Correlations Correlations 





Type unit for within- for between- Critical 
N unit halves unit halves ratio 
Discussion topic 147 75 49 3.71* 
Problem area $1 79 35 4.40* 
Fractions 48 72 85 1.65 





*Significant at the .001 level using one-tail test. 


Effectiveness of Larger Units in Interview Analysis 


Table 4 
Significance of the Difference between Correlations 
of Working Relationship Ratings for Within-Unit 
Halves and Between-Unit Halves 








Correlations Correlations 





Type unit for within- for between- Critical 
N unit halves unit halves ratio 
Discussion topic 147 89 66 5.34" 
Problem area 81 .90 57 5.15* 
Fraction 48 86 86 0.00 





*Significant at the .001 level of confidence using the 
one-tail test. 


Table 5 
Significance of the Difference between Sets 
of Difference Scores 
(Based on analysis by Mann-Whitney U test [1]) 








Unit Comparisons 
Problem area Topic Problem area 





Dimension vs. fraction vs. fraction vs. topic 
x x x 
Amount of lead 5.31T 4.81t 1.50 
Counselor 3.41T 3.30T 76 
assumption of 
responsibility 
Client 2.62t 1.48 1.64¢ 
assumption of 
responsibility 
Working 2.67¢ 1.81t 1.21 
relationship 





* Similar to the critical ratio. 
+ With an ¢>1.96 the null hypothesis can be rejected 
at the .01 level of confidence. 


t With an 2> 1.64 the null hypothesis can be rejected 
at the .05 level of confidence. 





units was made with hypothesis 4. The re- 
sults obtained from examination of this hy- 
pothesis are shown in Table 5. 

On all four of the variables used in this in- 
vestigation, problem-area unit ratings yielded 
difference scores which were significantly 
larger, at better than the .01 level, than those 
obtained for quintiles of the interview series. 
As Table 5 indicates, there is a significant 
difference for three of the dimensions studied 
when discussion topics and fractional units are 
compared. However, when problem-area and 
discussion-topic units were compared, only one 
difference was barely significant. In general 
then, both problem area and discussion topic 
produce greater variation from unit to unit 
than does the use of fractions. We have little 
basis for determining whether the problem area 
or discussion topic is better than the other. It 
should be noted, however, that in this com- 
parison all the differences were in the direc- 


187 


tion of greater sensitivity for problem-area 
units and that 58 per cent of the difference 
scores for these two sets were identical. 

An analysis of client and counselor behavior 
during the course of a “typical” interview sug- 
gested that one of the factors associated with 
behavior changes between successive problem- 
area or discussion-topic units is changing-role* 
behavior. That is, the variations within the 
conference may be a function of the demands 
and expectations which the client and counselor 
perceive in the counseling relationship for dif- 
ferent types of problems and topics. This re- 
search suggests that a more systematic evalua- 
tion of client and counselor roles may lead to 
a better understanding of the dynamics of the 
interview, i.e., the interaction phenomena and 
variations in client and counselor behavior. 

Discussion 

We have found that discussion-topic and 
problem-area units when compared to a frac- 
tional unit (a) show greater sensitivity to 
variations in behavior during the course of a 
conference and (+) bring together client and 
counselor responses which reflect related be- 
haviors and attitudes to a greater degree. 
Thus, it seems imperative that any unit used 
in interview typescript research be based on a 
psychologically meaningful, organizing prin- 
ciple. That is, units should bring together be- 
haviors and attitudes which are of the same 
pattern, and partition those segments of the 
interview which are dissimilar psychologically. 
Discussion-topic’ and problem-area units met 
this standard. 

The thorough analysis of a single “typical” 
interview and general impressions from read- 
ing over 200 interview protocols suggested the 
possibility of counselor or client roles as a 
basis for unit divisions. A study of roles as- 
sumed in the counseling interview may lead 
to further understanding of the dynamics of 
the interview. 

Since this is one of the first studies which 
use counseling interview records from several 
institutions, it was felt that there would be 
some value in investigating whether the schools 


*For our purposes, Sargent’s [6] definition of 
role seemed useful. He says, “A person’s role is a 
pattern or type of social behavior which seems situ- 
ationally appropriate to him in terms of the de- 
mands and expectations of those in his group.” 








188 


differ on the dimensions studied. Since the 
data secured represented a limited sampling — 
three different clients in nine interviews— 
from each campus, the findings must be con- 
sidered very tentative. Nonetheless, some sig- 
nificant differences were found in problem 
areas discussed and in the pattern of techniques 
used. The findings therefore suggest that, in 
making studies which are concerned with the 
interaction of client and counselor, the dif- 
ferential distribution of problem types and the 
counseling approaches at each of the schools 
studied should be considered. 


Summary and Conclusions 


This study examined the relative effective- 
ness of three larger units used in interview 
typescript analysis. The three criteria used to 
examine effectiveness were: (a) reliability, 
(4) sensitivity, and (c) degree to which the 
unit brings together related material and parti- 
tions that which is less related. 

It was found that discussion-topic, problem- 
area, and fractional units could all be rated 
with acceptable reliability and that the rating 
reliabilities obtained were not significantly dif- 
ferent. On the other two criteria, it was found 
that both problem-area and discussion-topic 
units were more effective than quintiles of the 


John E. Muthard 


interview series. Problem-area and discussion- 
topic units were not found to differ on these 
criteria. 

Examination of the protocols suggested that 
changing client and counselor roles may be im- 
portant factors associated with variability with- 
in the conference and are directly associated 
with topic and problem areas. Thus, the dis- 
cussion-topic and problem-area units provide 
an opportunity to study such roles, but quintiles 
of the interview series do not. 


Received October 20, 1952. 


References 


1. Mann, H. B., & Whitney, D. R. On a test of 
whether one of two random variables is stochas- 
tically larger than the other. Ann. math. Sta- 
tist., 1947, 18, 50-60. 

2. Raskin, N. J. An analysis of six parallel studies 
of the therapeutic process. J. consult Psychol., 
1949, 13, 206-220. 

3. Robinson, F. P. The unit in interview analysis. 
Educ. psychol. Measmt, 1949, 9, 709-716. 

4. Robinson, F. P. Principles and procedures in 
student counseling. New York: Harper, 1950. 

5. Rogers, C. R. (Ed.) Client-centered therapy 
Boston: Houghton Mifflin, 1951. 

6. Sargent, S. S. Role and ego in psychology. In 
J. H. Rohrer & M. Sherif (Eds.), Social psy- 
chology at the crossroads. New York: Harper, 
1951. Pp. 355-370. 


Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


On the Supposed Behavioral Correlates of an 
“Eye” Content Response on the Rorschach 


Michael Wertheimer 


Wesleyan University* 


Because it is clinically of the utmost im- 
portance to know as much as possible about 
the individual case, any kind of information, 
whatever its source, is often considered to be 
of great value. Thus one finds that in practice 
many correlates for test behavior are used 
whose empirical foundations are not optimal. 
If a practitioner has found in his experience 
that 4 tends to go with B, he may thenceforth 
attribute the characteristic B to patients who 
give him 4 on a test, and soon it may become 
standard diagnostic practice to consider 4 a 
“sign” of B. 

This situation is of course inevitable in a 
field so complex as that of personality diag- 
nosis, where there is such a great need for in- 
formation. Much research is going on at pres- 
ent to answer this demand. Considerable 
progress has already been made in the revision 
of theory, such as in the areas of color shock, 
experience balance, and diagnosis of organicity, 
to mention only a few. Empirical devices have 
recently also been brought to bear on the gen- 
eral area of the content of responses on the 
Rorschach test. 

The present study attempts validation of a 
representative content item. The item was 
chosen with regard to two criteria: first, it 
should be a relatively well-accepted interpreta- 
tion; second, it should be one which is rela- 
tively easy to test objectively. 

Lindner [2] writes of Card IV, D1, “The 
shading characteristics of this region encourage 
the perception of a ‘pair of eyes.’ Neverthe- 
less persons with strong paranoid tendencies, 
and those whose symptom complexes include 
ideas of influence or reference invariably note 


1 This research was performed while the writer 
was USPHS Clinical Psychologist at the Worcester 
State Hospital. 


the ‘eyes’ and express the opinion that they are 
‘looking at me.’”’ Elsewhere he states, “Per- 
sons with strongly paranoidal tendencies, how- 
ever, and whose symptom complexes include 
ideas of influence or reference, deliberately note 
the ‘eyes,’ abstracting them from the head of 
the insect or animal and adding the impression 
that ‘they are looking at me.’ Such responses 
appear with similar frequency from strongly 
inferiority-convicted persons. Among these 
such a response probably represents a projec- 
tion of the idea of being under the close scru- 
tiny of the examination” [3, p. 83]. Hertz 
writes, “Ideas of influence or reference may be 
detected by the persistence of eyes” [1, p. 15], 
and Schafer [4], in reference to paranoids, says 
that some of these patients emphasize “eyes.” 
While analyzing a Rorschach record, he 
writes, “On Card IV, under the impact of 
anxiety, his integrative ability is so impaired 
that he is restricted to the eyes of what is fre- 
quently seen as a full animal head (lower 
middle D), and misses the popular W (animal 
skin) altogether. On Card IX, another fre- 
quently disturbing card, precisely the same 
thing happens: after a long delay he makes out 
a pair of eyes, one of which is frequently in- 
tegrated into a full animal head. The choice 
of content ‘eyes’ can be considered a further 
indication of his paranoid trend . . .” [4, p. 
155]. Courses in practical Rorschach tech- 
nique often consider eyes as an important, or 
even an unmistakable sign of paranoid trends; 
and many workers in the field use it as an im- 
portant indicator in the practical everyday 
work of diagnosis. 

These statements about the relationship be- 
tween paranoid trends and eyes content seem 
to satisfy the two criteria of being well accept- 
ed and relatively easy to test. 


189 





190 


Procedure 


The files of the Psychological Laboratories 
at the Worcester State Hospital were used for 
the purpose of a statistical check. Altogether, 
230 records were examined. An attempt was 
made to find 15 records of males and 15 of 
females in each diagnostic group. In some cate- 
gories less than 30 records were available, so 
that the N was not always the same from 
group to group. The diagnostic categories used 
were: (a) paranoid psychoses including schizo- 
phrenia and other paranoid conditions; (4) 
schizophrenia, other types, which can also be 
called the schizo-affective group; (c) non- 
paranoid and nonaffective schizophrenics, in- 
cluding simple, hebephrenic, and catatonic 
schizophrenics; (d) manic-depressive patients, 
whether manic, hypomanic, depressed, or 
mixed; (e) organic patients, including alco- 
holics, epileptics, and those with disturbances 
of the central nervous system; (f) neurotics, 
independent of further classification; (g) a 
group without psychosis, diagnosed simple 
adult maladjustment; and finally (hk) psycho- 
paths. 


For each patient, age, sex, psychiatric diag- 
nosis, Stanford-Binet Vocabulary IQ and the 
total number of responses to the Rorschach 
were recorded. The record was then combed 
to find every occurrence of the word eyes, and 
the number of eye responses was recorded, with 
reference to whether they occurred in the free 
association or in the inquiry, and on which 
card they appeared. The original intent was 
also to record the detail which had given rise 


Table 1 


The Incidence of Eye Responses on the 
Rorschach Cards 











Card Number of Proportion 
eye responses 
I 34 09 
II 29 -08 
Ill 22 .06 
IV 83 .22 
V 17 04 
VI 37 10 
VII 26 07 
Vill 23 .06 
Ix 54 14 
x 52 14 
Total 377 1.00 





Michael Wertheimer 


to the response, but the data on the protocols 
were usually insufficient to give clear indica- 
tions of location. Records used were obtained 
in the period 1936 to 1952, from patients re- 
ferred to the psychological laboratories by the 
staff of the hospital. All were obtained by 
trained psychologists, ranging in experience 
from intern to director of psychological re- 
search. 


Results 


Table 1 shows the number of times that the 
word eyes occurred in response to the various 
cards. The table is based on the records ob- 
tained from the following groups: paranoid, 
schizophrenia other types, neurotics, and or- 
ganics; 119 records in all. Only 14 female 
organic records were found in the files. Lind- 
ner’s statement about Card IV that the “shad- 
ing characteristics . . . encourage the percep- 
tion of a ‘pair of eyes’”’ is consistent with the 
finding that Card IV produces the most fre- 
quent mention of eyes. It must be mentioned 
here that the number entered in the table is 
the number of times the word eyes appears in 
the free association, plus the number of times 
it occurs in the inquiry, independent of con- 
text. 

In order to get a more adequate picture of 
the “importance” of eyes in a record, each 
record was corrected for the total number of 
responses. An eye index, the number of times 
the word eyes appeared in the record, divided 
by the total R of the record, was obtained. 
This eye index was plotted against IQ, in 
order to see whether there was any relation- 
ship between the two. Lumping the 230 
records into four groups, Table 2 was obtained. 
Means and medians for each group are given, 
and both show the same trend. The great 
difference between the means and the medians 
is due to the extreme skewness of the distribu- 
tion of the eye index; 91 of the 230 records 











Table 2 
Eye Index as a Function of IQ 
IQ Range N Mean eye Median eye 
index index 
60— 79 27 197 085 
80— 99 68 115 030 
100-119 $2 158 -090 


120-139 53 -166 -100 











Behavioral Correlates of “Eye” Response on the Rorschach 191 


included not a single eye response, and there- 
fore almost 40% of the records had an eye 
index of 0.00. There is some indication in 
both statistics that the middle range of intelli- 
gence seems on the average to be less produc- 
tive of e) » responses on the Rorschach than do 
the extremes. 


To see whether the production of eyes on 
the Rorschach is a function of age, the eye 
index was plotted against chronological age. 
Such a plot, when age is broken down into 
five groups, gives Table 3. Since the means 
give an adequate picture of the relationship, 
medians are not given in the table, although 
the distributions are once more markedly 
skewed. The increase in the number of eye 


Table 3 


Eye Index as a Function of Age 











Age range N Mean eye index 
10-19 38 -100 
20-29 73 130 
30-39 64 -135 
4049 39 -187 
50-59 16 .218 





responses with increasing age is clear. What 
it signifies is open to speculation. 

To turn to the relation between eye re- 
sponse and diagnosis, Table 4 shows the pro- 
portion of subjects who reported eyes on Card 
IV, divided into the various diagnostic groups. 


Table 4 


The Relation between Incidence of Eye Responses 
and Diagnosis 








Proportion of 
S’s reporting 
eyes on card IV 





In free In 

Diagnostic group assoc. inquiry N 
Simple adult 

maladjustment .03 .28 29 
Schizophrenia, 

other types -10 37 30 
Simple, hebephrenic, 

and catatonic 

schizophrenics 13 33 30 
Psychopaths 1 38 27 
Organics .21 .29 29 
Neurotics 13 43 30 
Paranoids 17 43 30 
Manic-depressives 32 4a 25 





This result is somewhat discouraging in that 
we would expect the paranoid patients to 
stand out considerably above the others. Lind- 
ner’s strong statement that paranoids “invari- 
ably” note the eyes is not consistent with this 
finding. Paranoid patients do, however, report 
eyes relatively frequently compared with other 
diagnostic groups. 

Table 5 shows the mean eye index for the 
various diagnostic groups; the result here is 





Table 5 

Eye Index As a Function of Diagnostic Group 

Diagnostic group Mean eye index N 
Paranoids 178 30 
Neurotics 164 30 
Manic-depressives 142 25 
Schizophrenia, other types 116 30 
Simple, hebephrenic, and 

catatonic schizophrenics 111 30 
Psychopaths .108 27 
Simple adult maladjustment 105 29 
Organics 097 29 





more encouraging, but it must be remembered 
that this is eye index for all cards, not specifi- 
cally for Card IV alone. 

It is clear from this analysis that a response 
of eyes on Card IV, or a high eye index in the 
record, cannot be used automatically to infer 
paranoid ideation in a subject who produces 
such a record. 

It may well be objected at this point that 
psychiatric diagnosis is far from infallible, and 
that there may be features in a case which are 
more important than paranoid trends, and 
which force the psychiatrist to make a diag- 
nosis of something other than a paranoid dis- 
ease even though a patient may show clear 
suspiciousness. In an attempt to answer the 
critique, a further analysis was performed. A 
file of Keysort Cards, one for each patient ex- 
amined, was available in the office of the Di- 
rector of Psychological Research at the Wor- 
cester State Hospital. These cards include, 
among other information, a code for presence 
or absence of suspiciousness in the case history 
of the patient. Table 6 shows the analysis of 
the suspicious and nonsuspicious groups into 
records which showed at least one response of 
eyes on Card IV as against those which did 
not; also those records in which at least one 





192 





Michael Wertheimer 


Table 6 
The Number and Proportion of Patients Showing Eye Responses either on Card IV 
or on the Entire Record, as a Function of Whether Their Symptoms 
Included “Suspiciousness” or not. 








Sign 























Not susp. Suspicious Total Sign Not susp. Suspicious Total 
Eyes on 51 5 86 Eyes in 86 53 139 
Card IV (34%) (45%) record (57%) (68%) 
No eyes 101 43 144 No eves 66 25 91 
on Card IV (66%) (55%) in record (43%) (32%) 
All cases 152 78 230 All cases 152 78 230 
Table 7 
The Number and Proportion of Patients Showing Two or More, or Three or More Eye 
Responses in the Entire Record, as a Function of Whether Their Symptoms 
Included “Suspiciousness” or not. 
Sign Not susp. Suspicious Total Sign Not susp. Suspicious Total 
2 or more 70 38 108 3 or more 55 33 88 
eyes in record (46%) (49%) eyes in record (37%) (41%) 
Less than 82 40 122 Less than 97 45 142 
2 in record (54%) (51%) 3 in record (63%) (59%) 
All cases 152 78 230 All cases 152 78 230 








response of eyes occurred on any card as 
against those on which eyes were never men- 
tioned. 

Chi square was calculated for both of these 
two-by-two tables; in each case the probability 
that such distributions may have been drawn 
from a population in which there was no differ- 
ence in the production of eyes between the sus- 
picious and the nonsuspicious groups was just 
under .10. Chi squares for the two tables were 
2.82 and 2.79 respectively; chi square corres- 
ponding to a p of .10 is 2.72; for p = .05, it 
is 3.84. By the generally accepted criteria of 
statistical significance, therefore, it is suggested 
that neither production of eyes on Card IV 
nor production of eyes anywhere in the record 
differentiates between the suspicious and the 
nonsuspicious group. 

The mean eye index of the two groups, .156 
for the suspicious and .106 for the nonsus- 
picious groups, is suggestive of a difference be- 
tween them. 

One other criterion of paranoid trends, that 
of Hertz, remains to be tested. It is claimed 


that the persistence of eyes in a record is the 
determining criterion of suspiciousness in the 
patient. For analytical purposes, Table 7 gives 
the number of records showing two or more, 
and three or more eye responses, for both sus- 
picious and nonsuspicious patients. This might 
be one approach to the definition of “persis- 
ence.” Chi squares calculated on both of these 
come out with probabilities of .70 and .40 of 
obtaining such results by chance, suggesting 
that at least for the present sample, and with 
persistence defined as recurrence, there is no 
difference in the persistence of eyes between 
suspicious and nonsuspicious patients. 

A few other relationships were suggested 
by the data. Females tend to have more eye 
responses in their records than do males, in- 
dependent of diagnostic group; but this result 
was not statistically significant. Also, the sus- 
picious group tended to have fewer responses 
in the whole record than the nonsuspicious, 
but again this difference is not statistically sig- 
nificant. The median values of R for the two 
groups were 15.5 and 16.6 respectively, the 








Behavioral Correlates of “Eye” Response on the Rorschach 


means 18.7 and 19.2. 

At best, then, we can say that there might 
be some statistical trends in support of the 
statements given in the introduction, but none 
of these trends reaches the criteria of statistical 
significance which are currently in use. Even 
if clearly significant statistical differences were 
found, this would still be a far cry from the 
“invariable” type of relationship, the exist- 
ence of which is sometimes asserted in works 
on Rorschach content analysis. On the other 
hand, it must be said in all fairness that such 
statements of one-to-one relationships are the 
ones which lend themselves most readily to 
quick empirical check. 


Discussion 


Before further generalizations are made on 
the basis of these data, certain rather serious 
considerations must be mentioned. In the tab- 
ulations, a completely objective attitude was 
used; every single time that the word eye or 
eyes occurred, a check-mark was made. This 
independence from context, which the hypothe- 
sis required, has both great advantages and 
grave disadvantages. It makes it impossible for 
bias on the part of the investigator in any way 
to affect the results ; it is simple; and it permits 
easy statistical analysis. On the other hand, 
such an atomistic approach to the hypothesis 
may strike the practicing clinician as rather 
naive, in that it may be considered obviously 
incorrect to divorce the word eye completely 
from its context. However, this is a direct 
translation, into experimental terms, of the 
“cookbook” approach to Rorschach interpreta- 
tion. Although the authors of such books may 
often pay lip service to the importance of the 
context of the response, still, in the usual form 
of presentation, a one-to-one correlation of re- 
sponse to diagnostic significance is either im- 
plied or made necessary as a working hypothe- 
sis for the student. It may be argued that, for 
didactic purposes, such oversimplification is 
necessary and justified. It may be, however, 
(and this is the point of this study) that this 
method of teaching destroys the essence, and 
therefore the validity, of the technique. The 
findings of this study certainly seem to point 
to such a possibility. Therefore this stdy aims 
at the evaluation of a method of teaching the 
Rorschach, and may not have any immediate 


ed 


193 


relevance to the interpretation of the Ror- 
schach by an experienced clinician, although 
some implications may be drawn for Rorschach 
interpretation in general. 

It may well be that eyes in a record of a 
certain configuration may be highly related 
with paranoid trends, whereas another gestalt 
results in no predictive significance of eyes at 
all, i.e., there is a world of difference between 
the following response in the inquiry, “Well, 
this is his head; here are the ears, here is the 
nose, and here are the eyes and mouth,” and 
on the other hand, the response in the free as- 
sociation: “All I see on this are two eyes, look- 
ing straight out from the gloom.’” A test in 
which, for example, a number of experienced 
Rorschach workers would be asked to rate each 
eye response should eventually be undertaken, 
but would be a test of a different hypothesis 
than the one tested here. 

The present tendency towards objectifica- 
tion and towards clear-cut statements of re- 
lationships places more and more Rorschach 
materials within reach of the experimenter’s 
techniques. On the other hand, the same type 
of approach, in making extremely explicit 
statements on the basis of clinical experience, 
may sometimes unavoidably carry the connota- 
tion of extensive empirical justification. Such 
an implication regarding a statement for which 
there is as yet insufficient data may blind work- 
ers in the field to the need for objective re- 
search. 

If one considers the present study as an ade- 
quate test of the hypothesis that the eye re- 
sponse, as such, tends to go with paranoid 
trends, certain further ideas offer themselves 
for consideration. 

First of all, this study suggests that the gen- 
eral acceptance of an interpretation is by no 
means an index of its empirical validity. Per- 
haps it is precisely the most commonly accepted 
interpretations which are the ones most criti- 
cally in need of objective analysis. 

Secondly, while the testing of the present 
hypothesis presents some difficulties, other re- 
lationships between test results and behavior 


2 There were less than ten records in the entire 
230 which included responses as extreme as those of 
the second type mentioned, and by no means all of 
these occurred in records of individuals who had a 
paranoid disease or whose case histories gave evi- 
dence of suspiciousness. 





194 


which have become traditional in the field of 
the Rorschach may be even more difficult to 
test. Such relationships may be less definite 
with respect both to test criteria and to be- 
havior correlates. It is less difficult to get ob- 
jective agreement on the presence or absence 
of a word in a record than on items such as 
F + 9%; on the behavior end, it is easier to 
determine the presence of paranoid trends than 
it is to determine the presence or absence of 
strong hostile desires which are well controlled 
and not acted out. 

Until further research is completed and the 
results made available to the practicing Ror- 
schach worker, perhaps it would not be amiss to 
emphasize caution in the interpretation of Ror- 
schach records. Whereas the general tendency 
in the field today is to make as ominous an in- 
terpretation as possible, at least in a mental 
hospital, perhaps the other extreme is more 
called for, if indeed one extreme is to be pre- 
ferred to the other. Since most of the indices 
in use at present are probably not nearly as 
accurate differentiators as we would have them 
be, we would be more likely to help a patient 
by making definite interpretations only of 
those few things on which there is good evi- 
dence, and for the remainder, by being aware 
of the many possible alternative implications 
(some perhaps pointing to assets and to health) 
and the good-sized margin of error. 

This study, then, indicates that particular 
caution should be exercised in making one-to- 
one interpretations, and leads to the suggestion 
that attempts to find objective Rorschach cor- 
relates for different behavior patterns would 
profit more by investigating the configurations 





Michael Wertheimer 


and interactions of various indices, than by at- 
tempting to assign individual and invariable 
meaning to the indices. 


Summary 


Two hundred thirty Rorschach records from 
different diagnostic groups were examined for 
the frequency and incidence of eye content re- 
sponses. Although much of the literature states 
that an eye content response to the Rorschach 
is indicative of paranoid or suspicious trends 
in the subject, this statement is not borne out 
by the present data. Paranoid patients did not 
produce significantly more eye content re- 
sponses than did other diagnostic groups, nor 
did patients whose symptoms include sus- 
piciousness produce more eye content responses 
than did patients whose symptoms did not in- 
clude suspiciousness. Caution is urged in the 
use of one-to-one behavior correlates of Ror- 
schach signs. 


Received October 20, 1952. 


References 


1. Hertz, Marguerite R. Suicidal configurations in 
Rorschach records. Rorschach Res. Exch., 1948, 
12, 3-58. 

2. Lindner, R. M. Analysis of the Rorschach test 
by content. J. clin. Psychopathol., 1947, 8, 707- 
719. 

3. Lindner, R. M. The content analysis of the 
Rorschach protocol. In L. E. Abt & L. Bellak 
(Eds.), Projective psychology. New York: 
Knopf, 1950. 

4. Schafer, R. Clinical application of psychological 
tests. New York: International Univer. Press, 
1948. 














A i A ce 





Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


The Levy Movement Test: Suggestions for Scoring 
and Relationship to Rorschach Movement 
Responses 
Robert M. Allen, Charles D. Ray, and Robert C. Poole’ 


University of Miami 


At the present time there is little agreement 
in scoring movement determinants imposed by 
the subject on the Rorschach ink blots [1, p. 
18]. Another difficulty related to the move- 
ment response results from the complexity of 
the blots, i.e., the plates do not presenta 
homogeneous stimulus field but are composed 
of varying degrees of form, shading, chromatic, 
and achromatic variables affecting perception. 
A third problem springs from the relatively 
nondirected nature of the orientation of the 
subject to the Rorschach task. 


Problem 


In studying movement apart from other 
variables found in the Rorschach test situation, 
it would appear paramount to limit the task 
to less complex stimuli. A new and simpler 
instrument, the Levy Movement Cards 
(LMC),? draws its value from being less sub- 
ject to these difficulties. This new set of 
stimuli has been developed for the express pur- 
pose of studying correlates of the movement 
responses in the Rorschach. The plates are 
achromatic, bilaterally symmetrical, finger 
paintings showing humanlike figures in each 
half of the card. Color is absent, ostensibly to 
simplify the stimulus and to hold to a pos- 
sible minimum any determinant-complexes aris- 
ing from the presence of hue which might in- 
hibit or facilitate movement-response tend- 
encies. As a further measure in orienting the 
task to the perception of movement, the sub- 


1 The authors are indebted to Dr. Carl Williams, 
Department of Psychology, University of Miami, for 
his valuable assistance with the statistical phase of 
the paper. 

2 The Levy Movement Cards were supplied by 
the Psychological Corporation for research purposes. 
This kindness is greatly appreciated. 


ject is given the following directions when 
presented with the LMC: “I am going to show 
you some movement blots. These show people 
doing something. Look at the cards in the 
position in which I give them to you,* and tell 
me what the people are doing. When you are 
through, place the card face down on the table 
and I will give you the next one. Do you have 
any questions ?”’ 

In this paper the authors will (a) define 
and illustrate a simplified scoring scheme, and 
(6) compare the different types of movement 
concepts given to the LMC with those elicited 
by the Rorschach. In order to compare the 
movement data obtained from these tests a 
scoring system was devised for the Levy move- 
ment responses to parallel the Rorschach. The 
system proposed by Levy [6] and Zubin [9] 
was too complex for handling such data. In 
place of their numerically detailed scales for 
rating compliance, movement, content, and 
psychological .nature of the response, the re- 
vised scoring schedule includes the following 
elements : 


1. Empathy: the subject’s acceptance of the task. 
Rej—rejection of the card, no response given. X— 
response(s) other than movement given. N-Pr— 
movement response not spontaneous. Pr—movement 
response present and spontaneous. 

2. Movement: concept seen in motion. 

F—form or other nonmovement response given. 
OM—inanimate movement response (Klopfer’s m) 
AM—whole animal seen in motion (Klopfer’s FM) 
AdM—detail of animal in movement. HM—whole 
human seen in motion (Klopfer’s M). HdM—de- 
tail of human in movement. 

3. Forces: contributions made by external (EZ) 
and internal (J) forces in the movement response 
as seen by the subject; also refers to the level of 


8 One card, VI, is presented in three different po- 
sitions consecutively. 


195 





196 





Robert M. Allen, Charles D. Ray and Robert C. Poole 














Table 1 
Summary of Productivity and Movement Responses for LMC and Rorschach Tests 
Total Human Animal Other Total 
Measure Responses Movement Movement Movement Movement 
LMC R LMC R LMC R LMC R LMC R 
Raw Mean 13.1 26.2 8.8 3.7 6 3.8 2 6 11.4 7.7 
Raw Median 11 21 10 0 3 1 0 10 7 
Percent Mean 78 15 3 14 11 2 92 31 
Per cent Median 90 10 0 12 8 0 92 31.6 
movement energy imposed in the concept. These jin the AB-BA order, the LMC instructions 


may be equated with Rorschach flexor and extensor 
movement. 


E—mechanical, inanimate movement: falling, 
floating, balancing (by gravity), being pushed, 
blown apart, etc. 


E/I—static, positional animal or human move- 
ment: standing, leaning, poised, holding something, 
talking, sitting, breathing, etc. 

EI—active animal or human movement: walk- 
ing, bending, bird flying, fish swimming, etc. 

I/E —energetic animal or human movement: 
dancing, diving, swimming, running, jumping, 
fighting, walking or standing on hands, intercourse, 
etc. 


J—uncontrolled, fantastic, inanimate movement: 
exploding, etc. Movement must be forced from the 
internal, otherwise score E. 


4. Content: standard scoring as in the Rorschach 
test. 


5. Popularity and originality: apply Rorschach 
standards. 


Example: Levy Card VI, “Looks like two people 
in bathing suits, diving into a pool of water.” 
Movement response is spontaneous, Pr; human 
movement seen, HM; movement is energetic, 1/E; 
clothing and water mentioned, Clo, N. Scored: Pr 
HM I1/E Clo, N; Rorschach equivalent: W M 
H. 


Method 


In a preliminary study with 25 college stu- 
dents the above proposed scoring scheme proved 
to be adequate for representing and compar- 
ing the nature, content, and level of empathy 
of the movement responses in the LMC and 
Rorschach protocols. Each subject was admin- 
istered the two tests by the individual method 
and by the same tester. In every instance the 
Rorschach preceded the LMC Test. This pro- 
cedure was established because the LMC in- 
structions definitely set the subject to perceive 
human movement. Had these tests been given 


could conceivably influence the spontaneous 
production of movement responses in the Ror- 
schach plates. The results could be unnecessar- 
ily confounded by this variable. 


Results and Discussion 


An examination of the distributions of the 
responses reveals the nonparametric nature of 
the data and the consequent inability to assume 
normality. Table 1 presents movement-re- 
sponse means and medians based on the abso- 
lute raw number of responses and the ratio of 
movement to total responses (percentage) giv- 
en by each subject on each test separately. The 
total productivity for each test is also indicated. 
The number of raw mean and median total 
responses for the Rorschach is double that for 
the LMC. Only three subjects have more 
than 15 responses in the latter test, none less 
than nine; the limited range of 9 to 14 re- 
sponses includes 84 per cent of the subjects. 
This is not surprising in view of the rigidity of 
the LMC directions as compared with the rela- 
tive freedom of the Rorschach instructions. The 
LMC directions seem to imply that giving a 
human movement percept relieves the subject 
of the need for further response to the card. 
This is not so with the Rorschach test. 

Again Table 1 reveals a dramatic story: 
the movement story. For this seiected popula- 
tion, as a group, movement was a major de- 
terminant in 92 per cent of the LMC re- 
sponses and in 31 per cent of the Rorschach 
associations. As indicated above, this should be 
the expected direction of variance because of 
the specific movement-perception set imparted 
by the LMC instructions. This is supported by 
an analysis of the movement responses into 
three types: human, animal, and other. Using 
the median percentage data (to mitigate the 











The Levy Movement Test 


distorting effects of extreme ratios), it may be 
seen from Table 1 that LMC human-move- 
ment percentage greatly exceeds Rorschach 
movement, 90 per cent to 10 per cent human- 
movement associations, respectively. In keeping 
with the attitudinal set, animal responses are 
minimal in the LMC while the Rorschach 
elicits a median of 12 per cent FM. Again one 
must turn to the LMC drawings for a possible 
explanation—animal engrams are not as readi- 
ly elicited because of the instructions and the 
easily recognized humanlike quality of the 
finger paintings, much more so than the nebu- 
lous ink blots. For future research, then, there 
remains the problem of evaluating the ration- 
ale behind the animal responses in the LMC 
Test which represents a definite departure from 
specific instructions. 

Other movement concepts are consistently 
higher for LMC than for the Rorschach. An 
analysis of the contents of the subject’s re- 
sponses again points to the structure of the in- 
dividual LMC plates which determines such 
associations as “wind blowing,” “rain falling,” 
and “whirling” movements — concepts which 
are readily engendered by the chiarascuro tones 
of the finger paintings which are suggestive of 
natural phenomena. To recapitulate: Table 1 
discloses (a) data related to total productivity 
for the LMC and Rorschach tests, and (4) 
an analysis of the three types of movement as- 
sociations into human, animal, and other. 


Table 2 


Correlation between LMC and Rorschach 
Movement and Total Responses 





197 


dividual comparisons by the ranking method 
[2, 4, 7, 8]. Table 2 discloses the rho coefh- 
cients between the types of movement responses 
obtained on the LMC and Rorschach tests and 
the correlation between the total number of 
responses for both tests. None of these values 
is significant for estimating a definite relation- 
ship between the two variables [3, p. 314]. 
One cannot state, if such an hypothesis entered 
into the considerations of the LMC propon- 
ents, that a high number of movement associa- 
tions in the LMC Test is necessarily accora- 
panied by increased movement responsiveness 
in the Rorschach ink blots, or vice versa. The 
same hypothesis is untenable for total produc- 
tivity on both tests. These findings suggest 
further that there is no acceptable means of 
predicting, for the individuals within the study 
group, to what extent productivity and move- 
ment engrams in one test will be reflected in 
the other. 


Table 3 


Significance of LMC and Rorschach 
Movement Percentages 

















Response Rho 
Human movement ......................... +.20 
Animal movement ........................ — .07 
Other movement — .08 
Total movement +.24 
Total responses +-.50 








The question which flows logically from the 
findings presented in Table 1 is: what is the 
relationship between movement associations of 
the LMC and Rorschach tests? Two nonpara- 
metric statistics are utilized in order to ap- 
proach the answer to this question—rank cor- 
relation, rho; and Wilcoxon’s method for in- 








Response Diff. p 
Human movement ............. 63 <.01 
Animal movement .............. 11 <.01 
Other movement ................ 9 <.02 
Total movement ................ 61 <.01 





An examination of the raw percentage dis- 
tributions of -the responses leaves the impres- 
sion that, as a group, there are some signifi- 
cant differences in LMC and Rorschach re- 
sponsiveness. The percentage data were ana- 
lyzed for significance in accordance with Wil- 
coxon’s method [7, 8].* The probabilities are 
presented in Table 3. The implications are 
clear: there are significant differences in move- 
ment-determined concepts for the group on the 
LMC and Rorschach tests. Human-, other-, 
and total-movement responses are elicited more 
prominently by the LMC; animal-movement 
associations are predominantly engendered by 
the Rorschach ink blots. These results support 
the data reported in the earlier portion of the 
paper. 

4This is a nonparametric test based on rank 


order and is known as Wilcoxon’s 7 test. No as- 
sumption of normality of distribution is made. 





198 


Conclusions 


Can the LMC Test be employed as a means 
for checking the Rorschach movement respon- 
siveness of a subject? It would appear that the 
present findings, with this group of subjects, 
militate against such an hypothesis. All that 
may be concluded is: (a) The Levy Move- 
ment Card Test elicits a significantly higher 
percentage of movement responses than does 
the Rorschach Psychodiagnostic Test. (4) 
The LMC Test is less productive of total re- 
sponses than the Rorschach. (c) While the 
above two conclusions hold for this study popu- 
lation as a whole, prediction for an individual 
subject’s total and specific type of movement 
responsiveness cannot be made. Thus, it is 
difficult to evaluate high or low productivity 
by a subject in one test in terms of productivity 
in the other test. (d) As a final opinion, pend- 
ing futher research, the authors believe that 
the greater degree of structure in the LMC 
instructions and the nature of the drawings 
themselves could account for the results ob- 
tained. The authors suggest that much more 
work has to be done with the LMC Test be- 
fore it may justifiably be employed as a check 
on Rorschach movement responsiveness. 





Robert M. Allen, Charles D. Ray and Robert C. Poole 


Received September 22, 1952. 


References 


1. Allen, R. M. Student’s Rorschach manual. Mi- 
ami, Fla.: Dept. of Psychology, Univer. of Mi- 
ami, 1950, mimeographed. 


2. Friedman, M. The use of ranks to avoid the 
assumption of normality. J. Amer. statist. Ass., 
1937, 32, 675-701. 


3. Guilford, J. P. Fundamental statistics in psy- 
chology and education. (2nd Ed.) New York: 
McGraw-Hill, 1950. 


4. Moses, L. E. Non-parametric statistics for psy- 
chological research. Psychol. Bull., 1952, 49, 
122-143. 


5. Rust, R. M. Some correlates of the movement 
response. J. Pers., 1948, 16, 369-401. 


6. Rust, R. M. IIL. The Levy Movement Cards: 
EPA Round Table. J. Pers., 1948, 17, 153-156. 


Wilcoxon, F. Individual comparisons by rank- 
ing methods. Biometrics Bull., 1945, 1, 80-82. 


“NI 


8. Wilcoxon, F. Probability tables for individual 
comparisons by ranking methods. Biometrics, 
1948, 3, 119-122. 


9. Zubin, J.. & Young, K. M. Manual of projec- 
tive and cognate techniques. Madison, Wis.: 
College Typing Co., Univer. of Wisconsin, 
1948, multigraphed. 





Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


Prognosis in Paranoid Schizophrenia on the Basis 
of the Rorschach’ 


David Grauer 


Veterans Administration Hospital, Hines, Illinois 


Previous published prognostic investigations 
of the Rorschach on schizophrenics have' made 
no systematic attempt to isolate the paranoid 
subtype for special study. Evidence in the liter- 
ature, however, suggests that the paranoid sub- 
type is distinct from others and therefore 
merits consideration as a special group [2, 11, 
14, 22, 24]. For this reason, the present study 
was confined to paranoid schizophrenics. 

Several investigations of the Rorschach as a 
prognostic instrument in schizophrenia have 
been reported [8, 9, 13, 15, 16, 17, 19, 20, 
21]. Each of the studies has discovered combi- 
nations of Rorchach signs which significantly 
distinguished improved from unimproved schiz- 
ophrenics. There are, however, many incon- 
sistencies and contradictions in the specific 
signs found to be prognostic, even among the 
investigations of the psychologist who has con- 
tributed most to this literature [19, 20, 21]. 
A recent review by Windle [25] summarized 
the results of many of these prognostic studies. 

A definite explanation for the diversity of 
these findings cannot, of course, be given in the 
present state of our knowledge. Nevertheless, 
it is clear that a majority of the investigations 
are deficient in many respects. In the first 
place, prognostic signs were established on a 
purely empirical basis by comparing Rorschach 
scores of improved and unimproved schizo- 
phrenics. There is a striking paucity of hy- 


1 Reviewed in the Veterans Administration and 
published with the approval of the Chief Medical 
Director. The statements and conclusions published 
by the author are the result of his own study and do 
not necessarily reflect the opinion or policy of the 
Veterans Administration. 

The writer is indebted to the following for their 
assistance and encouragement: Dr. Donald W. Fiske 
and Dr. James G. Miller, University of Chicago, 
and Dr. Roy Brener, Veterans Administration Hos- 
pital, Hines, Ill. 


potheses in these studies. Since most of the pa- 
tient samples are small and almost no cross 
validation is reported, it is obvious that, as 
Cronbach [5] has recently emphasized, the re- 
liability of the results is open to question. 
Finally, in view of the fact that the groups 
differ from one another in age, sex, intelli- 
gence, subtype classification, etc., it is perhaps 
not surprising that different prognostic signs 
were obtained. 

The hypotheses for the present study were 
derived mainly from the prognostic investiga- 
tions of schizophrenics by psychiatrists. Among 
the favorable prognostic signs most commonly 
reported in the literature [3] were a relatively 
well-adjusted prepsychotic personality, and the 
presence of affective features in the clinical 
picture. These observations form the principal 
basis for our operational hypotheses. 


Problem 


The aim of the present study was to in- 
vestigate the value of the Rorschach in pre- 
dicting the outcome of treatment of paranoid 
schizophrenics. It was assumed that the con- 
flicting results obtained in previous prognostic 
investigations were due, in part, to the inade- 
quate control of various factors. Among the 
most important of these factors, it was be- 
lieved, is that of subtype classification. In the 
present study only patients diagnosed as para- 
noid schizophrenics were included. 

This investigation differed from most of the 
other prognostic studies in at least one signifi- 
cant respect. No attempt was made to arrive 
at a combination of signs to distinguish im- 
proved from unimproved groups on a purely 
empirical basis. The purpose of all of the 
group comparisons made here was to test cer- 
tain hypotheses or predictions, and at the same 


199 








200 David Grauer 


time to re-examine certain prognostic signs 
cited by previous investigators. 

The main reliance in this study was on rela- 
tively “objective” signs and techniques of 
measuring Rorschach variables. The author is 
aware of objections to this procedure that may 
be raised by clinicians, and of the limited 
validity and reliability of the “scales” used. 
Nevertheless, it is felt that such methods are 
most feasible in the present state of knowledge, 
and in view of the similar techniques applied 
in previous research in this area. 

One chief hypothesis was that schizophrenics 
who improve should reveal evidence of a great- 
er degree of anxiety in their Rorschach records 
than those who fail to improve. This hy- 
pothesis was based on an inference from one 
of the most frequently noted clinical observa- 
tions, that schizophrenics who improve are 
more likely to manifest affective reactions 
(presumably including anxiety) than those 
who fail to improve. Furthermore, it seems 
reasonable that anxiety in a schizophrenic is 
indicative of an active struggle with the psy- 
chotic process. Such an attempt to regain con- 
tact with reality is in sharpest contrast to the 
apathy and withdrawal exhibited by typical 
deteriorated schizophrenics. 

Another major hypothesis was that schizo- 
phrenics who improve should give evidence of 
having possessed a relatively better adjusted 
prepsychotic personality than unimproved pa- 
tients. This would conform to the conclusion 
of Klopfer and Kelley regarding prognostic 
signs in the Rorschach: “The more the pre- 
treatment record approaches ‘normal,’ the 
better the prognosis” [14, p. 368]. It is also 
in agreement with results of investigations of 
prepsychotic histories of schizophrenics. 

A minor hypothesis considered was that im- 
proved patients should show less evidence of 
organic brain damage than unimproved. This 
supposition is based on the belief held by Bel- 
lak [4] and others that schizophrenia may be 
either of organic or psychogenic origin, or may 
involve varying degrees of each factor, and that 
the more favorable prognosis would be expect- 
ed in those cases in which there are fewer or- 
ganic factors. 


Method 
To test the hypothesis that anxiety is a favor- 


able prognostic factor, Elizur’s Rorschach con- 
tent test [7] was employed. This test was 
designed to measure degree of anxiety and 
hostility manifested by the content of Ror- 
schach responses. In view of the reliability 
and validity reported by Elizur, this device 
was considered appropriate. 

Since it was believed that measures of 
“neurotic tendency” as indicated by Rorschach 
responses would give another indication of 
anxiety, the “neurotic signs” of Miale and 
Harrower-Erickson [10] were also used in 
this study. 


To test the prediction that improved schizo- 
phrenics would show more nearly “normal” re- 
sponses in their pretreatment Rorschach than 
unimproved patients, the following Rorschach 
factors were compared: total number of re- 
sponses, distribution of whole and detail re- 
sponses, number of human movement, color, 
shading, good form, space, human, and anato- 
my responses. As these components have been 
those most frequently compared in past studies, 
it is possible to relate the results to those of 
other investigators. 


In addition, the data were analyzed for the 
most recent Rorschach prognostic signs report- 
ed by Phillips [17] and Piotrowski [21]. 
Records were scored for Phillips’ ‘Color 
Drive,” a scale purporting to measure “‘affect- 
tive drive” based on the subjects’ color re- 
sponses. The Rorschach protocols were also 
scored for the six Piotrowski signs: “Variety,” 
“Generic Term,” “Evidence,” “Color Re- 
sponse,” “Indirect Color Approach,” and “De- 
murring.” These terms are defined and illus- 
trated in the last of Piotrowski’s reports [21]. 


Source of data and subjects. All clinical 
records of patients diagnosed as paranoid schiz- 
ophrenics who had pretreatment Rorchach tests 
during 1947 and 1948 and were treated at 
Hines VA Hospital were examined. In select- 
ing the records for the present study, the fol- 
lowing criteria were used rigorously: 


1. All patients included had a final diagnosis of 
“paranoid schizophrenia” after at least three months 
of hospitalization. This diagnosis was made by two 
psychiatrists, one of whom was a consultant. (In 
the Veterans Administration Hospitals a “consul- 
tant” is a specialist who is affiliated with a univer- 
sity and has attained a recognized standing in his 
field.) 

















Prognosis in Paranoid Schizophrenia by the Rorschach 


2. There was positive evidence in the case record 
that the patient had hallucinations or bizarre de- 
lusions before treatment. 

3. Patients were classified as “improved” if they 
were discharged from the hospital as in remission or 
improved and gave evidence that they no longer 
experienced delusions or hallucinations. The “un- 
improved” group consisted only of patients who 
were recommended for transfer to an institution for 
long-term hospitalization. 


In addition to meeting the criteria de- 
scribed, each patient in the improved group 
was matched in age with a patient in the un- 
improved group. Out of 140 Rorschach rec- 
ords of paranoid schizophrenics, 36 patients 
were found, 18 improved and 18 unimproved, 
who fulfilled all these requirements. All the 
patients are males and veterans of World War 
II; all received some form form of shock treat- 
ment (insulin or electroshock) during the peri- 
od from 1947 to 1949. 

The two groups were similar not only in 
age but also in IQ and in years of schooling. 
Mean ages for the two groups were 28.1 and 
28.3; mean IQ’s, 98.8 and 101.4; mean years 
of schooling, 9.2 and 10.7. No statistically sig- 
nificant differences in intelligence or in educa- 
tion were found. In both groups a small 
number had histories of previous hospitaliza- 
tion for mental illness: seven of the improved 
and four of the unimproved. In addition to 
shock treatment, four patients in each group 
were also given some form of psychotherapy. 
The only important difference between the 
two groups was in duration of hospital treat- 
ment, the unimproved group remaining almost 
twice as long as the improved group. This is 
related to the fact that more of the unimproved 
than of the improved patients received both 
insulin and electroshock therapy. Patients who 
fail to benefit from one form of shock therapy 
are often given another form. 

The Rorschach examinations were ad- 
ministered and scored or supervised by experi- 
enced clinical psychologists. In each case the 
examinations were given prior to treatment. 
The protocols were typed and all scoring and 
identifying information of both patient and 
original examiner were removed. The records 
were then rescored independently by psychol- 
ogists other than the Rorschach administrator. 
There were thus two independent scores for 
every Rorschach response. The present writer 


201 


checked each record in detail and, where dif- 
ferences in scoring occurred, made the final 
decision. The rescoring was done, of course, 
without knowledge as to whether the record 
belonged to the improved or unimproved pa- 
tients. The records were scored in accordance 
with Beck’s [3] system. Animal movement 
responses and form percentages as defined by 
Klopfer [14] were added for special scoring 
purposes. 
Results 


Rorschach scores as prognostic indicators. 
Table 1 presents the mean pretreatment scores 
of improved and unimproved patients on 
various Rorschach components. The only sig- 
nificant differences are as follows: The im- 
proved patients produced fewer total responses, 
higher W%, and fewer D’s than did the un- 


Table 1 
Mean Scores of 18 Improved and 18 Unimproved 
Paranoid Schizophrenics on Rorschach Components 














? 
2 ¥ :. ¢ 
gé =F a i 3 
= oe os =| rat 
| i | | 
28 = 5 5 # = 
R 17.3 29.1 2.12 2.60* 
W 6.6 5.7 0.71 0.78 
W % 42.6 25.6 2.59* 2.60* 
D 9.6 17.3 2.36* 3.23°* 
D% 51.6 60.9 1.89 1.37 
Dd il 6.1 1.65 2.00 
Dd % 5.9 13.6 1.42 1.97 
s 0.9 2.9 0.71 1.92 
M 1.3 2.2 1.42 1.78 
FY 1.2 1.8 1.65 1.52 
Sum Y 1.3 2.3 0.24 0.92 
FC 0.5 1.0 1.89 1.47 
CF 1.1 1.8 1.18 1.22 
Sum C 2.1 3.1 0.47 1.26 
F plus % 77.1 70.9 0.94 1.39 
H 1.3 2.4 0.94 1.59 
H % 12.1 17.9 0.47 1.24 
An 1.7 2.0 0.71 0.41 
An % 8.9 6.8 0.24 0.63 





* Significant at the .05 level of confidence. 
** Significant at the .01 level of confidence. 


+ The “sign test,” described by Johnson [12] does not 
make any assumption regarding normality of distribu- 
tion (as does the ¢ test). Since many of the Rorschach 
variables may not be normally distributed, it was decid- 
ed to use both tests of significance. These sigma values 
are approximations, based on the “normal curve.” Since 
the data used in computing signs are not continuous, 
significance was established by referring to the table of 
Dixon and Massey [6]. 





202 David Grauer 


improved group. The other differences were 
statistically insignificant. The trend of the re- 
maining differences may be summarized as fol- 
lows: Improved patients exhibited fewer 
human movement, fewer shading, and a small- 
er number of color responses than unimproved 
patients. Since the latter factors depend to 
some extent on total number of responses, these 
results were not entirely unexpected. The 
smaller number of total responses given by 
improved patients, together with the trend 
toward a restriction of color and movement 
responses, suggested that Rorschach records of 
improved patients tend to resemble the records 
of neurotic more than they do those of normal 
individuals. The prediction that improved 
patients would produce more nearly “normal” 
records than unimproved patients is thus not 
borne out by the data. This finding is opposed 
to the evidence from many previous investiga- 
tions and to the previously cited conclusion of 
Klopfer and Kelley [14]. 

“Neurotic signs” as prognostic indicators. In 
order further to test the hypothesis that im- 
proved patients would demonstrate more evi- 
dence of neurotic tendency than unimproved 
patients, the revised check list of Harrower- 
Erickson [10] was applied to the Rorschach 
data. Improved patients scored higher on the 
Harrower-Erickson check list of “neurotic 
signs” than unimproved patients: improved pa- 
tients’ score, 6.97 ; unimproved patients’ score, 
5.64. The difference between the two groups, 
however, was not statistically significant. The 
general trend of the data was, nevertheless, in 
the direction of confirming the hypothesis, and 
consistent with the restricted Rorschach rec- 
ords of the improved patients and with the re- 
sults to be described in the next section. 

Anxiety and hostility as prognostic signs. 
One of the principal hypotheses was, it will 
be recalled, that improved patients would show 
greater anxiety than unimproved patients. 
Elizur’s [7] Rorschach content test was select- 
ed to measure anxiety. Since this test also is 
designed to measure degree of hostility on the 
basis of content, the latter variable was also 
scored for both groups on an exploratory basis. 

Since the scores on the RCT are calculated 
by adding the scores of separate Rorschach re- 
sponses, it can be seen that there is a direct re- 
lationship between total number of responses 


and total anxiety or hostility scores. In his 
report Elizur indicated that only records con- 
taining a specified range of scores were used. 
In view of the disparity in total number of 
responses between the two groups, it was neces- 
sary to make a correction by equalizing the 
total response scores, by totaling the separate 
scores and dividing by the total number of re- 
sponses for each Rorschach record. The re- 
sults were then expressed in terms of average 
anxiety or hostility score per response. 

The results were as follows: The mean 
anxiety score per response for the improved 
patients was .44; for the unimproved, .23. The 
t ratio of 4.02 is significant at the .01 level of 
confidence. According to the “Sign Test,” 
(See Note, Table 1), the significance also 
reaches the .01 level. In 15 of 18 comparisons, 
improved patients scored higher on the anxiety 
scale than unimproved patients. 

The mean hostility score per response for 
improved patients was .07 ; for the unimproved, 
.17. The ¢ ratio was 2.00, which approaches 
the .05 level of confidence. On the basis of 
the “Sign Test” it is significant at the .05 
level. The improved patients showed less 
hostility than the unimproved group on this 
scale. 

The prediction that improved patients would 
exhibit more anxiety in their Rorschach records 
than unimproved patients was, therefore, ful- 
filled. An additional finding was that improved 
patients also show a definite trend toward a 
lower degree of hostility than unimproved pa- 
tients. 

Prognostic value of “Organic signs.” One 
hypothesis suggested by the literature is that 
schizophrenics who fail to recover should show 
more evidence of brain damage than those who 
do. To test this hypothesis, the data were 
evaluated in terms of Piotrowski’s [18] “Or- 
ganic signs.” The results failed to confirm this 
hypothesis. The mean number of organic 
signs for the improved was 3.17; for the un- 
improved, 2.67. The difference is not statisti- 
cally significant. 

Prognostic signs found by Phillips and Pio- 
trowski. An attempt was made to analyze the 
Rorschach data of our paranoid schizophrenics 
on the basis of Piotrowski’s [21] six prognostic 
signs (Variety, Generic Term, Evidence, Color 
Response, Indirect Color Approach, and De- 














ee ee 


oe OM 2 





Prognosis in Paranoid Schizophrenia by the Rorschach 203 


murring). In spite of Piotrowski’s detailed 
description of these signs, the present writer 
experienced considerable difficulty in making 
judgments regarding their existence in many 
cases. Nevertheless, it is felt that, on the 
whole, the results are fairly reliable. 

The attempt to analyze our Rorschach data 
by means of Piotrowski’s six signs yielded neg- 
ative results. The improved patients’ records 
contained 4.3 signs; the unimproved, 4.1 signs. 
All but one patient in each group attained 
three or more signs, in contrast to Piotrowski’s 
finding that most of his improved schizo- 
phrenics gave three or more signs, while very 
few of his unimproved patients did so. Fur- 
thermore, none of the six signs distinguished 
the improved from the unimproved patients at 
a statistically significant level. 

Similarly negative results were obtained 
when Phillips’ three signs were applied to the 
Rorschach data. These three signs were: 
“Demurring” (one of Piotrowski’s six signs), 
W% between 15 and 25, and “Color Drive” 
score. There was no difference in the W% 
criterion, four patients in each group respond- 
ing with W’s between 15 and 25%. The total 
“Color Drive” score for the improved group 
was 2.8; for the unimproved, 2.7. Scores per 
response (correcting for disparity in total 
number of responses) likewise showed no dif- 
ference between the groups: mean score per 
response for the improved group being .10; 
for the unimproved, .09. 

Thus, neither the prognostic signs of Phil- 
lips nor those of Piotrowski successfully dis- 
tinguished the improved from the unimproved 
paranoid schizophrenics. 


Discussion 


The finding that evidence of anxiety ia the 
Rorschach records of paranoid schizophrenics 
is a favorable sign is in keeping with the clini- 
cal observations of psychiatrists. As already 
mentioned, the presence of affective factors in 
the clinical picture is one of the most frequent- 
ly noted characteristics of schizophrenics who 
later recover [4]. Since anxiety would be con- 
sidered an affective reaction, the results con- 
firm clinical-psychiatric data on prognostic in- 
dicators. Perhaps anxiety is a favorable in- 
dicator because it implies a struggle with the 
psychotic process. Such a condition contrasts 


markedly with the inertia, blunting of affect, 
and general apathy characteristic of chronic 
and deteriorated schizophrenics. 

Previous Rorschach prognostic studies rare- 
ly mention anxiety as a prognostic sign. Never- 
theless, some of the signs found by Piotrowski 
and Phillips, though they are not designated 
as such by these authors, could be interpreted 
as indicative of anxiety. Piotrowski’s [21] 
description of two of his signs, “Evidence” and 
“Demurring,” suggests that basically these 
factors may be an expression of anxiety. “Evi- 
dence”’ is scored when the subject indicates he 
is weighing the evidence regarding adequacy 
of response by using terms expressing doubt. 
“Demurring” relates to the subject’s holding 
back responses in order not to commit himself 
to interpretations of which he is not certain. 

Similarly, many features of Phillips’ [17 | 
“Color Drive Scale” are identical with 
Elizur’s criteria for scoring anxiety in the 
Rorschach. Both scales give positive weights 
to responses like “blood,” “fire,” and others 
suggesting stress and danger. 

The other positive findings—that hostility 
is an unfavorable sign—has also not specifical- 
ly been mentioned in the literature of Ror- 
schach investigators. Two recent clinical 
studies of delusions of schizophrenics do, 
however, tend to substantiate this result. 
Seitz [23] and Albee [1], in separate investi- 
gations, reported that there is greater likeli- 
hood for improvement if a schizophrenic’s delu- 
sions manifest more intrapunitive than extra- 
punitive mechanisms. This finding would at 
the same time corroborate the previous conclu- 
sion that anxiety is prognostically favorable. 

The second major hypothesis—that Ror- 
schach signs pointing to a relatively good pre- 
psychotic adjustment indicated a favorable out- 
come—was not confirmed by the data. This 
lack of confirmation of previous Rorschach 
findings requires some explanation. It should 
be noted that, while there is considerable vari- 
ation in specific signs cited as prognostic by 
different investigators, there seems to be a defi- 
nite tendency for “favorable” Rorschach signs 
to be associated with a good prognosis in the 
majority of the investigations. Why should 
the paranoid group show a reversal of this 
trend? 

One possible explanation lies in the nature 





204 David Grauer 


of the main positive finding, that anxiety is 
a favorable sign. As might be expected, anxie- 
ty resulted in a constricted Rorschach record 
for most of the improved patients. Such a 
record would show lowered productivity and 
consequently a reduction or elimination of 
those movement or color responses that reveal 
the more positive aspects of personality. In 
other words, the effect of anxiety would be to 
inhibit the expression of favorable personality 
factors. Perhaps the paranoid group, being the 
subtype in which control is the most rigid [2], 
is the one where the presence of anxiety, if it 
exists, would be the most conspicuous favor- 
able sign. In other varieties of schizophrenia, 
where there is less intellectual control, there 
may be a greater opportunity for release of 
those responses in the Rorschach which are in- 
dicative of inner resources and capacity for 
emotional rapport. 

In view of the ill-defined nature of the diag- 
nostic classification of schizophrenia, there is 
a great need for precise descriptions of the 
particular group selected for observation, as 
well as of the type of treatment given. Better 
measures of results of treatment are also re- 
quired. Although the present data fall far 
short of ideal in all these respects, the investi- 
gation was limited to a fairly well-defined sub- 
group of schizophrenia with specified behavior- 
al characteristics. Confirmation of the results, 
which is necessary if reliance is to be placed on 
them, should come from a repetition of this 
type of investigation on a similar group of 
paranoid schizophrenics. 


Summary 


This investigation dealt with the prognostic 
value of the Rorschach in paranoid schizo- 
phrenia. It was felt that previous prognostic 
studies were inadequate because of failure to 
control various factors, particularly that of 
subtype classification. The chief aim of this in- 
vestigation was to test certain hypotheses re- 
garding prognosis from Rorschach records of 
paranoid schizophrenics. 

The pretreatment Rorschach records of 36 
diagnosed paranoid schizophrenics, who were 
subsequently given shock treatment, were 
analyzed to determine the presence of prog- 
nostic signs that would distinguish the im- 
proved from the unimproved patients. The 


sample consisted of 18 improved and 18 unim- 
proved subjects matched, by pairs, for age. 
There was no statistically significant difference 
between the two groups in IQ and in educa- 
tional level. 

The following results were obtained: The 
pretreatment Rorschach records of improved 
paranoid schizophrenics contained evidence of 
a greater degree of anxiety than did the records 
of unimproved patients. Mean anxiety score 
per response on Elizur’s Rorschach content 
test differentiated the two groups at a statisti- 
cally significant level. Improved patients also 
manifested significantly less hostility than un- 
improved patients. There was some tendency, 
though not statistically significant, for im- 
proved patients to exhibit more “neurotic 
signs” in their Rorschach records than unim- 
proved patients. On the average, Rorschach 
records of improved paranoid schizophrenics 
tended to be less productive and to contain 
fewer movement and color responses than those 
of the unimproved group. The latter results 
are in harmony with the previously cited find- 
ings of greater anxiety in the improved pa- 
tients. 

No difference was found between improved 
and unimproved patients in total number of 
Piotrowski’s “organic .gns” in their Ror- 
schach records. An effort to confirm specific 
Rorschach prognostic signs found by previous 
investigators yielded negative results. 

An attempt was made to rationalize the posi- 
tive findings in this study and to integrate 
them with previous prognostic indicators re- 
ported by psychiatrists and clinical psychol- 
ogists. 


Received November 3, 1952. 


References 


1. Albee, G. W. The prognostic importance of 
delusions in schizophrenia. J. abnorm. soc. Psy- 
chol., 1951, 46, 208-212. 

2. Alexander, F. Fundamentals of psychoanalysis. 
New York: Norton, 1948. 

3. Beck, S. J. Rorschach’s test. Vol. I. New York: 
Grune & Stratton, 1946. 

4. Bellak, L. Dementia praecox. New York: 
Grune & Stratton, 1948. 

5. Cronbach, L. J. Statistical methods applied to 
Rorschach scores: A review. Psychol. Bull., 
1949, 46, 393-429. 

6. Dixon, W. J., & Massey, F. J. An introduction 





} 
: 





2 RP IB ON 


10. 


11, 


12. 


13. 


16. 


Prognosis in Paranoid Schizophrenia by the Rorschach 205 


to statistical analysis. New York: McGraw- 
Hill, 1951. 

Elizur, A. Content analysis of the Rorschach 
with regard to anxiety and hostility. Rorschach 
Res. Exch., 1949, 18, 247-284. 

Graham, Virginia T. Psychological studies of 
hypoglycemia therapy. J. Psychol., 1940, 10, 
327-358. 

Halpern, Florence C. Rorschach interpretation 
of the personality structure of schizophrenics 
who benefit from insulin therapy. Psychiat. 
Quart., 1940, 14, 826-833. 

Harrower-Erickson, Molly R. The value and 
limitations of the so-called “neurotic signs.” 
Rorschach Res. Exch., 1942, 6, 109-114. 
Henderson, D. K., & Gillespie, R. D. A text- 
book of psychiatry. New York: Oxford Univer. 
Press, 1944. 

Johnson, P. O. Statistical methods in research. 
New York: Prentice-Hall, 1949. 

Kisker, G. W. A projective approach to per- 
sonality patterns during insulin-shock and met- 
razol convulsive therapy. J. abnorm. soc. Psy- 
chol., 1943, 37, 120-124. 

Klopfer, B., & Kelley, D. McG. The Rorschach 
technique. Yonkers, N.Y.: World Book Co., 
1942. 

Morris, W. W. Prognostic possibilities of the 
Rorschach method in metrazol therapy. Amer. 
J. Psychiat., 1943, 100, 222-230. 

Pacella, B. L., Piotrowski, Z., & Lewis, N. D. 
C. The effects of electric convulsive therapy 
on certain personality traits in psychiatric pa- 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


tients. Amer. J. Psychiat., 1947, 104, 83-91. 
Phillips, L. Personality factors and prognosis 
in schizophrenia. Unpublished doctor’s disser- 
tation, Univer. of Chicago, 1949. 

Piotrowski, Z. The Rorschach inkblot method 
in organic disturbances of the central nervous 
system. J. nerv. ment. Dis., 1937, 86, 525-537. 
Piotrowski, Z. Prognostic possibilities of the 
Rorschach method in insulin-treated schizo- 
phrenics. Psychiat. Quart., 1938, 12, 679-689. 
Piotrowski, Z. A simple experimental device 
for the prediction of outcome of insulin treat- 
ment in schizophrenia. Psychiat. Quart., 1940, 
14, 267-273. 

Piotrowski, Z. The Rorschach method as a 
prognostic aid in the insulin-shock treatment of 
schizophrenics. Psychiat. Quart., 1941, 15, 807- 
822. 

Roe, Anne, & Shakow, D. Intelligence in men- 
tal disorder. Ann. N.Y. Acad. Sci., 1942, 42, 
361-490. Also in S. S. Tomkins (Ed.), Con- 
temporary psychopathology. Cambridge: Har- 
vard Univer. Press, 1943. Pp. 348-354. 

Seitz, P. F. D. A dynamic factor correlated 
with the prognosis in paranoid schizophrenia, 
Arch. Neurol. Psychiat., Chicago, 1951, 65, 
604-606. 

Wittman, Phyllis. The use of the multiple 
choice Rorschach as a differential diagnostic 
tool. J. clin. Psychol. 1945, 1, 281-287. 
Windle, C. Psychological tests in psychopatho- 
logical prognosis. Psychol. Bull., 1952, 49, 451- 
482. 








Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


A Comparison of Scores on the Index of Adjust- 
ment and Values with Behavior in Level- 
of-Aspiration Tasks’ 


Robert E. Bills 


University of Kentucky 


Many writers have expressed the opinion 
that level of aspiration, obtained from experi- 
mental studies, samples a type of behavior 
which is characteristic of subjects’ goal-setting 
behavior in general. In the experimental tasks, 
a subject is given practice and, after being in- 
formed of his level of performance, is asked to 
state his level of aspiration for the next trial. 
This is the most common method for assessing 
level of aspiration. Another possible method of 
measuring level of aspiration is by determining 
the discrepancy which exists between the con- 
cept of self and the concept of the ideal self. 
The latter type of aspiration is a resultant of 
the self percepts and the ideals held by a per- 
son, and may or may not be comparable with 
the level of aspiration represented by the simple 
tasks usually used in level-of-aspiration studies. 
One of the purposes of the present study was 
to investigate the extent of the relationship 
between level of aspiration as determined by 
motor and verbal tasks and personal level of 
aspiration as determined by the Index of Ad- 
justment and Values [3]. 


The Index of Adjustment and Values. The 
Index of Adjustment and Values requires that 
a subject make three ratings, on a five-point 
scale, for each of 49 traits. These ratings have 
been grouped into three measures: concept of 
self, acceptance of self, and concept of the ideal 
self. The discrepancy between concept of self 
and concept of the ideal self is a measure of 
the level of aspiration set by a subject in re- 


1 This study was made possible through a grant 
by the University of Kentucky Research Fund. The 
author wishes to express his appreciation to the 
committee in charge of the fund and to Glen E. 
Roberts who worked as assistant on the project. 


spect to his ideals. The Index has been shown 
to be valid as a measure of emotionality [2, 9], 
and is capable of distinguishing personality 
types [1]. 

Hypotheses of the study. It was predicted, 
on the basis of the reasoning described above, 
that personal level of aspiration as revealed 
by the Index would be significantly correlated 
with level of aspiration as determined by ex- 
perimental tasks of a motor and verbal char- 
acter. 

Eysenck [4], in a study of the level-of-as- 
piration behavior of hysterics and dysthymics, 
demonstrated that his hysterical cases were 
more variable in the levels of aspiration they 
set than were his dysthymic cases. 

A Rorschach study [1] of high and low 
scorers on acceptance of self as measured by 
the Index of Adjustment and Values showed 
that high and low scorers constituted two dis- 
tinct personality groups, and that the low 
scorers showed more rigid and less variable 
Rorschach records than did the high scorers. 
On the basis of this result and Eysenck’s find- 
ings it was predicted that a group of subjects 
with acceptance-of-self scores below the popu- 
lation mean of the Index would be less vari- 
able in the level of aspiration they set than 
would a group characterized by acceptance-of- 
self scores above the population mean of the 
Index. 

It should be possible in a controlled setting, 
such as a level-of-aspiration experiment, to 
demonstrate relationships between personality 
measures and observations of behavior. In a 
previous study [3] it was shown that accept- 
ance of self is related to direction of perceived 
threat. Subjects who scored above the popula- 


206 











sna na Aetna a 


The Index of Adjustment and Values and Level of Aspiration 207 


tion mean of acceptance of self as measured 
by the Index of Adjustment and Values 
blamed factors outside themselves for unhap- 
piness they had experienced, but subjects below 
the mean blamed themselves. It was predicted, 
therefore, in the present study that subjects 
who scored above the mean on acceptance of 
self, when questioned about their performance, 
would respond differently from subjects who 
were below the mean on acceptance of self. 
More specifically, it was hypothesized that: 
(a) Subjects who score below the population 
mean on acceptance of self, as shown by the 
Index, will underestimate their performances 
more often than subjects who score above the 
mean on acceptance of self; (4) subjects who 
score above the population mean on acceptance 
of self, wher asked to recall their level of per- 
formance after an interpolated task, will re- 
member their performance as being above their 
true performance level and subjects who score 
below the mean will recall their level of per- 
formance as being lower than it was; (c) sub- 
jects who are below the population mean on 
acceptance of self, when asked, “How do you 
feel about your performance?” will express 
negative, internalized attitudes while subjects 
who are above the population mean will ex- 
press positive, externalized attitudes. 
Design 

Thirty female subjects were selected, from 
among students at the University of Kentucky, 
on the basis of discrepancy scores on the Index 
of Adjustment and Values. The selected sub- 
jects had scores distributed from high to low 
on discrepancy. 

Each subject was given, individually and in 
one session, five level-of-aspiration tasks with 
the order of presentation of the tasks random- 
ized to prevent serial effects. The tasks were 
selected to represent a variety of motor and 
verbal activities and mixtures of the two. The 
five tasks included: 1, dart throwing; 2, Rotter 
target aspiration board [11] as modified by 
Lonstein [8]; 3, marking out letters; 4, sub- 
stituting letters; and 5, addition. 

On the dart-throwing task each subject was 
given five darts, and the score was the number 
of points scored on all five darts. The modified 
Rotter board was a board with a groove to 
accommodate a marble, on which depressions 


were made at regular intervals to stop the 
marble. The center depression scored six and 
depressions on either side were on a descending 
scale to one point. Each subject rolled the 
marble five times for each trial. In the mark- 
ing-out test, the subject was given a paragraph 
and instructed, “Mark out as many e’s as you 
can. You have two minutes.” The substitu- 
tion test consisted of a series of letters with 
a blank space beneath each. At the top of the 
page was a key. Subjects were instructed to 
place below each letter the correct letter ac- 
cording to the key. I'wo minutes were al- 
lowed on each series. The arithmetic tests 
were series of five two-place numbers. The 
digits were arranged in a different order in 
each series but the problems on all the series 
contained the same combinations. The time 
limit was two minutes. At the beginning of 
each task the subject was requested not to at- 
tempt to add his own score. 

In each task the subject was given a series 
of practice trials and then informed of his score 
on the last practice trial. He was then told, 
“The next trial is a test. How much do you 
expect to make?” The subject was told his 
score on the test trial and his performance was 
recorded. He was then asked, “How much 
do you think you should make the next time?” 
Performance was recorded and the subject was 
asked, “How much do you think you scored ?” 
The subject was then told, “You scored. .. . 
This is your score on the . . . test. How do 
you feel about your performance?” The sub- 
ject’s remarks were recorded verbatim. 

After the five tasks had been completed, the 
subject was handed a mimeographed page and 
instructed, “Read this as rapidly and care- 
fully as possible.” The experimenter pretend- 
ed to make notes and to time the performance. 
This exercise was included to separate the 
tasks from recall of performance. After the 
subject had completed the reading of the ex- 
cerpt, he was asked to recall his performance 
on each task. Recall was recorded and cer- 
tainty of recall was judged on a five-point scale 
by the experimenter. 


Results 


The experiment yielded data of four types 
including: level-of-aspiration scores, estimates 
of performance, recall of performance, and at- 








208 


titude toward performance. ‘These will be dis- 
cussed below. 


Index scores and level of aspiration. Sub- 
jects were ranked on the two Index scores and 
according to level of aspiration shown by each 
of the five tasks. Level of aspiration was de- 
fined operationally to be the difference between 
the first test performance and the goal set by 
the subject for his second test performance. 
Rank-order correlation coefficients for accept- 
ance-of-self scores and discrepancy scores of the 
Index of Adjustment and Values and the five 
measures of level of aspiration are given in 
Table 1. The data in Table 1 show that the 
two scores of the Index are not highly corre- 
lated with level of aspiration in the five meas- 
ures used. Only two of the correlations were 
significantly different from zero at the .05 
level or less. The correlation of discrepancy 
score and mark-out test was significantly dif- 
ferent from zero at the .05 level of confidence 
and the correlation of acceptance-of-self scores 
and Rotter board scores was significantly dif- 
ferent from zero at the .01 level. 


Table 1 
The Rank-Order Correlation Coefficients between the 
Index Scores and Level of Aspiration 
Shown by the Experimental Tasks 











Tasks Acceptance of Self Discrepancy 
Darts 18 —.24 
Rotter 51° —.15 
Addition .08 -.07 
Substitution 13 .04 
Mark-out 01 .36* 





* Significantly different from zero at the .05 level. 
** Significantly different from zero at the .(1 level. 


Intercorrelations of level-of-aspiration tasks. 
In order to compare levels of aspiration as 
shown by each of the five experimental tasks 
the rank orders for all of the tasks were inter- 
correlated. The intercorrelations are low but 
with the exception of the dart task, which did 
not correlate significantly with any of the other 
tasks, are significantly different from zero at 
the .05 level or less. The intercorrelations of 
the level-of-aspiration tasks appear to be higher 
than the correlations of the Index scores and 
the level-of-aspiration scores. The intercorre- 
lations of the level-of-aspiration tasks are con- 
tained in Table 2. 





Robert E. Bills 


Table 2 


Rank-Order Intercorrelations among zhe Five 
Experimental Tasks 














Substi- 
Rotter Addition tution Mark-out 

Darts -.01 —.08 .02 .30 
Rotter .39* 39° 44* 
Addition 49%* 43% 
Substitution .39* 

* Significantly different from zero at the .05 level. 
** Significantly different from zero at the .01 level. 


Index scores and variability of level of as- 
piration. The distributions of levels of aspira- 
tion in the five tasks were converted into 
standard scores so that scores on the tasks 
would be comparable. This manipulation re- 
sulted in five distributions of standard scores, 
one for each task. Each of these five distribu- 
tions was divided into two groups dependent 
upon the subject’s score (above or below the 
mean) on acceptance of self on the Index of 
Adjustment and Values, and into two other 
groups according to discrepancy scores above 
and below the population mean on the Index. 
Variances of each of the pairs of distributions 
were compared by means of the F test. These 
tests show that on two of the five tasks, subjects 
with acceptance-of-self scores above the popu- 
lation mean of the Index were significantly 
more variable in the levels of aspiration they 
expressed than were subjects who had accept- 
ance-of-self scores below the mean. Likewise, 
subjects with discrepancy scores below the 
mean were more variable than subjects with 


Table 3 


Variabilities of Levels of Aspiration when Subjects 
Are Divided According to Acceptance-of-Self 
Scores on the Index of Adjustment 














and Values 
Variance 
Acceptance Acceptance 
of Self of Self 
Above Below 
Mean* Mean 
Task =13 N=17 F p 
Darts 1.188 1.414 1.190 N.S. 
Rotter 913 1.036 1.135 N.S. 
Mark-out 1.612 .653 2.469 .05 
Substitution 1.419 196 7.240 .001 
Addition 822 1.190 1.448 N.S. 





* Acceptance-of-self scores on the Index of Adjustment 
and Values above the mean of the standardization popu- 
lation of the Index. 








oot 


The Index of Adjustment and Values and Level of Aspiration 209 


scores above the mean. Acceptance-of-self 
scores on the Index and discrepancy scores, 
when correlated, yield coefficients around 
—.75. These data are presented in Tables 3 
and 4. 


Table 4 
Variabilities of Levels of Aspiration when Subjects 
Are Divided According to Discrepancy Scores 
on the Index of Adjustment and Values 








Variance 
Discrepancy Discrepancy 
Below Above 
Mean* Mean 





Task N=16 N=14 F p 
Darts 1.185 1.400 1.181 N.S. 
Rotter 1.722 331 5.202 ma. 
Mark-out 1.068 914 1.168 N.S. 
Substitution 478 1.061 2.220 N.S. 
Addition 1.528 538 2.840 05 





* Discrepancy scores on the Index of Adjustment and 
Values below the mean of the standardization population 
of the Index. 


Acceptance of self and estimate of perform- 
ance. Subjects were asked to estimate their 
performance on each of the tasks as soon as 
they had completed each task. A dichotomous 
chi square was calculated using the mean ac- 
ceptance-of-self score of the standardization 
population of the Index to form one dichotomy 
and the number of tasks in which performance 
was underestimated as the other dichotomy. 
The latter dichotomy was formed by dividing 
the number of underestimates of all the sub- 
jects at the median number of underestimates. 
This placed all subjects with four or five 
underestimated performances in one category 
and subjects with three, two, or one under- 
estimated performance in another category. 
All subjects underestimated at least one per- 
formance. Two subjects who were above 
the mean on acceptance of self underestimated 
four or five performances as compared with 
ten subjects below the mean, while ten sub- 
jects who were above the mean on accept- 
ance of self underestimated three, two, or 
one performance as compared with eight 
subjects below the mean. This combination 
yielded a chi square of 4.535 which is sig- 
nificantly different from zero at the .05 
level, and may be interpreted to mean that 


estimate of performance is related to accept- 
ance of self. 


Acceptance of self and recall of performance. 
At the end of the experiment, following the 
reading of the excerpted paragraphs, each sub- 
ject was asked to recall his scores on each of 
the tasks. The number of overestimates, cor- 
rect estimates, and underestimates was calcu- 
lated for each subject and the subjects were 
divided into “‘overestimators” and “underesti- 
mators.”’ The one subject who had more cor- 
rect than faulty recalls was eliminated from 
consideration. ‘These two groups formed one 
dichotomy of a double dichotomy. The other 
division was formed by dividing the subjects’ 
acceptance-of-self scores at the mean accept- 
ance-of-self score of the standardization popu- 
lation of the Index. 


Eight subjects with acceptance-of-self scores 
above the mean were classed as “overesti- 
mators” while only four subjects below the 
mean were included in this category. Four 
subjects with acceptance-of-self scores above 
the mean were included as “underestimators”’ 
but 13 subjects with scores below the mean 
were grouped in this division. This combina- 
tion produced a chi square of 5.396 which is 
interpreted at the .05 level to show that recall 
of performance is related to acceptance of self. 

Acceptance of self and certainty of recall. 
Each subject was rated on the degree of cer- 
tainty for each of his five recalls. Subjects 
were divided according to acceptance-of-self 
scores above and below the standardization 
mean and certainty of recall using the median 
of the averages for the 30 subjects as the divid- 
ing point. A double dichotomous chi square 
was calculated but was not significant. 

Acceptance of self and attitude toward per- 
formance. Each subject was asked to state how 
he felt about his performance in each of the 
tasks. Lists of the verbatim comments of the 
subjects were given to five judges who were 
asked to classify each statement.? The judges 
decided whether the attitude expressed in the 
statement was positive or negative and whether 
it was directed toward or away from self. In 
this way five judges rated five statements for 
each subject. The subjects were then cate- 
gorized according to the classification which 
appeared most frequently as negative-internal, 

2 The author wishes to express his thanks to Vir- 


ginia Parko, Thomas Sutherland, Travis Rawlings, 
and Glen E. Roberts who served as judges. 





210 Robert E. Bills 


negative-external, positive-internal, and _posi- 
tive-external. A double dichotomous chi square 
was calculated using negative and positive at- 
titudes as one dichotomy and scores above and 
below the population mean on acceptance of 
self as the other dichotomy. One subject with 
an acceptance-of-self score above the mean ex- 
pressed a negative attitude toward his per- 
formance compared with 12 subjects below the 
mean, but 12 subjects with acceptance-of-self 
scores above the mean demonstrated positive 
attitudes while only 5 subjects below the mean 
showed positive attitudes. The chi square pro- 
duced by this combination was 11.868, which 
is significant at less than the .01 level. Thus, 
type of attitude expressed toward performance 
in the level-of-aspiration tasks is related to ac- 
ceptance of self. 


A chi square was calculated using acceptance 
of self as one dichotomy and direction of ex- 
pressed attitude as the other dichotomy. Eight 
subjects with acceptance-of-self scores above 
the population mean directed their attitudes 
about their performance toward themselves 
and 5 directed it away from themselves, while 
16 subjects below the mean directed their at- 
titudes toward themselves and 1 directed it 
away from himself. This chi square was equal 
to 4.887, which is significant at the .05 level, 
and shows that acceptance of self is related to 
the direction of expressed attitude. 


Discussion 


The intercorrelations of the Index and the 
experimental tasks. Eysenck has stated: 


The actual size of the correlation [between two 
levels-of-aspiration tasks] would appear to depend 
on three points: 


(1) The similarity of the two tests, as regards 
the ability measured. 

(2) Similarity of the scale on which goodness of 
performance is measured. 


(3) Similarity of the experimental situation, i.e., 
whether the two tests are given in the same 
experimental session or not [4, p. 133]. 


Studies with normal subjects have reported 
correlations between level of aspiration on dif- 
ferent tests ranging from .29 by Gould [7] 
to .70 by Frank [5] with an estimated median 
of .40 to .45. 


The intercorrelations of the five tasks in- 


cluded in the present study approximate the 
average of the reports in the literature and are 
about as high as might be expected when ex- 
amined in the light of Eysenck’s statement. The 
tasks were purposely selected to represent a 
variety of motor and verbal performances, and 
only the last one of Eysenck’s three generaliza- 
tions was applicable to this experimental situa- 
tion, namely, that all of the experimental tasks 
were given in one session. 

The data show that the Index scores were 
not highly correlated with level of aspiration 
as measured by the five tasks, but the correla- 
tions approach the size of the intercorrelations 
among the five experimental tasks. Several 
possible explanations of the low degree of in- 


tercorrelation between the tasks, and the cor- — 


relations of the Index scores with the tasks 
appear tenable: (a) The Index is not a valid 
measure of level of aspiration. (4) Ego in- 
volvement in a task is an important variable 
in determining the usefulness of a task as a 
measure of level of aspiration. (c) There may 
be more than one type of level of aspiration 
and each of the tasks and the Index may be 
measures of general and specific types of level 
of aspiration. (d) The level of aspiration as 
measured by the Index may be a more exact 
measure than the levels shown by the experi- 
mental tasks. These possibilities will be dis- 
cussed below. 

In regard to the first point, the Index of Ad- 
justment and Values has not been validated 
as a measure of level of aspiration, but experi- 
mental tasks which are commonly used as 
measures of this variable have not been valid- 
ated for this purpose either. It seems logical 
to assume, however, that the discrepancy score 
of the Index is as valid a measure of level of 
aspiration as the usual experimental tasks. The 
subject in filling out the Index is asked on 49 
items, after he has made a rating of his present 
status, to tell how he would like to be. This 
process is equivalent to obtaining 49 estimates 
of his level of aspiration. This is essentially 
the same process by which level of aspiration 
is determined in experimental tasks. 

On the second point, several authors [6, 
11] have noted that the more a subject sees 
a level-of-aspiration task as a “game,” the less 
likely will be a high correlation between this 
task and other measures. This may mean that 





ee 





ree 


The Index of Adjustment and Values and Level of Aspiration 211 


the greater the extent of ego involvement in 
a task, the more valid the task will be as a 
measure of level of aspiration. The Index 
seems eminently fitted to the requirements of 
an ego-involved measure. All of the ratings on 
the Index are done by the subject, on himself, 
and from his own frame of reference. 

It seems reasonable to assume that the per- 
sonal level of aspiration of female college stu- 
dents would contain a large verbal factor, and 
thus, it becomes of greater interest that the 
highest relationship discovered between dis- 
crepancy scores and level of aspiration in the 
present study was shown in the “mark-out” 
test which is probably the most verbal task of 
the battery. The results indicate the possibility 
of the interaction of level of aspiration and con- 
cept of self. It is possible that the validity of 
level-of-aspiration measures is at least partially 
dependent upon the concept of self of the sub- 
ject. Level of aspiration shown by a task is 
possibly more highly related to personal level 
of aspiration when performance in the task is 
seen by the subject as enhancing to self-or- 
ganization. The importance of personal fac- 
tors in determining level of aspiration has been 
demonstrated in numerous experiments. 

The third and fourth points cannot be 
settled in this paper, but are probably worthy 
of further study involving factor analysis and 
validation with external criteria. ““Would the 
Index or the usual type of level-of-aspiration 
tasks show higher agreement with external 
criteria?’ is a question which might be in- 
vestigated profitably. 

The Index and variability of performance. 
Studies [1, 3] of the Index of Adjustment and 
Values have shown that the mean acceptance- 
of-self score of the standardization population 
separates two groups which are distinctly dif- 
ferent in personality characteristics. Rorschach 
examination revealed that high scorers on ac- 
ceptance of self have more variable personali- 
ties than low scorers, and the data from the 
present study provide additional evidence for 
this conclusion. In the two of the five tasks 
where the difference in the variability of the 
groups was statistically significant, the high 
scorers constituted a more variable group than 
the low scorers. 

Acceptance-of-self scores on the Index are 
negatively and highly correlated with discrep- 


ancy scores [3]. When the variabilities of 
level-of-aspiration behavior of high and low 
scorers on discrepancy were compared, the low 
scorers were significantly more variable on two 
of the tasks than were the high scorers. These 
two tasks were not the same two which were 
significantly different when the division was 
made on the basis of acceptance of self, which 
may indicate that although the two Index 
scores are highly correlated, they are probably 
measures of different variables. 

The Index and other observed behavior. 
The Index served as a better predictor of other 
behavior recorded in the experiment than it 
did for level of aspiration. Attitude toward 
self was shown to be related to attitude toward 
performance, direction of expressed attitude 
toward performance, estimate of performance, 
and recall of performance. These findings 
support the conclusion of other writers [10, 
12] that attitude toward self is an important 
determinant of behavior. 


Summary 


Thirty volunteer, female subjects were test- 
ed with the Index of Adjustment and Values 
and five level-of-aspiration tasks. Subjects set 
levels of aspiration for each of the five tasks, 
estimated their performance in the tasks, ex- 
pressed comments regarding their performance, 
and after a filled interval attempted to recall 
their performances. 

It was concluded that Index scores were, 
to a low degree, related to level of aspiration, 
as measured by the experimental tasks, that the 
variability of the level of aspiration set by 
groups selected by the Index was significantly 
different, and that acceptance of self shown 
by the Index was significantly related to atti- 
tude toward performance, direction of ex- 
pressed attitude toward performance, estimate 
of performance, and recall of performance. 


Received July 23, 1952. 


References 


1. Bills, R. E. Rorschach characteristics of per- 
sons scoring high and low in acceptance of 
self. J. consult. Psychol., 1953, 17, 36-38. 

2. Bills, R. E. A validation of changes in scores 
on the Index of Adjustment and Values as 
measures of changes in emotionality. J. con- 
sult. Psychol., 1953, 17, 135-138. 





212 





Robert E. Bills 


Bills, R. E., Vance, E. L., & McLean, O. S. 
An index of adjustment and values. J. consult. 
Psychol., 1951, 15, 257-261. 

Eysenck, H. J. Dimensions of personality. Lon- 
don: Kegan Paul, 1947. 

Frank, J. D. Individual differences in certain 
aspects of level of aspiration. Amer. J. Psy- 
chol., 1935, 47, 119-128. 

Frank, J. D. The influence of the level of per- 
formance in one task on the level of aspiration 
in another. J. exp. Psychol., 1935, 18, 159-171. 
Gould, R. An experimental analysis of “level 
of aspiration.” Genet. Psychol. Monogr., 1939, 
21, 1-116. 


10. 


Lonstein, M. A comparative study of level of 
aspiration variables in neurotic, psychopathic, 
and normal subjects. Unpublished doctoral dis- 
sertation, Univer. of Kentucky, 1952. 

Roberts, G. E. A study of the validity of the 
Index of Adjustment and Values. J. consult. 
Psychol., 1952, 16, 302-304. 

Rogers, C. R. Client-centered therapy. Boston: 
Houghton Mifflin, 1951. 

Rotter, J. B. Level of aspiration as a method 
in the study of personality. I. A critical review 
of methodology. Psychol. Rev., 1942, 49, 463- 
474. 

Snygg, D., & Combs, A. W. Individual behao- 
ior. New York: Harper, 1949. 








hn NT in as 


Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


Directionality of Lines in the Bender-Gestalt Test 
Roland M. Peek 


Hastings State Hospital' 


Although the use of the Bender-Gestalt test 
(B-G) at all age groups has increased in both 
clinical and educational settings, interpreta- 
tional techniques with demonstrated validity 
have been glaringly lacking. Interpretation of 
B-G protocols has for the most part been high- 
ly subjective and consequently dependent upon 
the skill and range of experience of individual 
examiners whose administration and manner of 
interpretation may vary widely from that of 
other “experts” dealing with other populations. 
In most cases the communication of the person- 
al-experience type of skill (assuming its va- 
lidity) is hampered and distorted by the lack 
of objective and definitive terms with which to 
describe the behavior occurring in response to 
the stimulus figures. 


One of the first needs in the study of the 
B-G, therefore, has been for some sort of scor- 
ing system which would clearly define each 
variable in objective terms so that the scoring 
would become satisfactorily reliable, therefore 
providing variables which were subject to ex- 
perimentation ; the scoring must also be simple 
enough as to be applicable to the needs of the 
time-pressed clinician. Any scoring system 
should be capable of this dual function. This 
means that the scoring could not be totally 
subjective as were pioneer systems [1, 3], nor 
yet so minutely objective that scoring time pro- 
hibited the use of the test [2]. Another diffi- 
culty which has occurred is that some systems 
function on a level which excludes much psy- 
chologically meaningful data, or combines vari- 
ables which are not psychologically compatible, 
thus restricting the potential usefulness of the 
test both clinically and experimentally [5]. 

In the course of developing and validating 
a scoring technique aimed at a resolution of 
these problems [6], it was noted that a par- 


1 This study was made at the VA Hospital An- 
nex, Ft. Snelling, St. Paul, Minnesota. 


213 


ticular aspect of B-G behavior seemed to be of 
clinical significance, though not included in any 
scoring system previously published: the order 
and directionality used by the subject in draw- 
ing the parts of each figure. Such data can be 
included in a sheet containing general observa- 
tions of the subject’s comments, handedness, 
attitude, etc. during the testing.? It has seemed 
apparent that deviations from popular modes 
of drawing the figures have psychological sig- 
nificance independently of the shape of the re- 
sulting drawing, and that this information 


‘combined with other variables increases the 


range of predictable extra-test behavior and the 
accuracy of such predictions. 


Problem 


To test the hypothesis that variations from 
the usual approach reflect individual differ- 
ences in personality, comparison was made be- 
tween a control group and another group from 
the same population who drew the diagonal 
projection of Figure 5 in a specifically deviant 
way, i.e., starting at the top and drawing the 
dots in a series down to the half-circle, rather 
than extending the projection from the half- 
circle outwardly, as most subjects do. 


Subjects 


The control group consisted of 75 hospital- 
ized male neuropsychiatric patients random- 
ized by selecting every 20th discharged case in 
the files of a VA acute-treatment hospital. The 
75 cases in the experimental group, who came 
from the same section of the same hospital, 
were selected solely on the basis of having been 
reported to the author by any of six psychologi- 
cal trainees as having drawn the projection on 


2 The authors of the Peek-Quast scoring system 
for the B-G are greatly indebted to Dr. George 8. 
Welsh for the original suggestion that these addi- 
tional data may have potential clinical usefulness. 





214 Roland M. Peek 


Figure 5 in the specified direction.’ Since no 
investigation was made of the psychological 
testing of the controls, it is not known how 
many of that group drew Figure 5 as described. 


Procedure 


A check list was devised on which was tallied 
the information listed in Table 1. The source 
of the information was the brief final summary 
prepared by the physician when a patient was 
discharged from the hospital; because of its 
nature more comprehensive or more specific 
data were unfortunately not consistently 
enough noted to make worth while their in- 
clusion in the check list. A given category was 
checked if one of the specific words in each 
category was used when referring to the be- 
havior of the patient. The attributes were not 
tallied if described as deriving from the psy- 
chological testing without reference to overt 
behavior. The occurrences of each category 
were expressed in percentages; the statistical 
significance of the differences in percentages 
were determined by the use of phi coefficients 
as described by Jurgensen [4]. 


Table 1 


Items from Discharge Summary 
Included in Check List 








1. Diagnosis 
2. Treatment (EST, IST, Other) 
3. Prognosis 


(a) good (b) fair (c) guarded (d) poor 
4. Insight 
(a) good (b) fair (c) partial 


(d) poor (é) none 
5. Number of somatic complaints 
(a) none ‘b) one (c) two 
(d) three (¢) four (f) five 
6. Number of somatic complaints 
(cumulative totals) 
(a) one or more (b) two or more 
(c) three or more (d) four or more (e) five 
or more 
7. Headache 
8. GI complaints 
9. Cardiac complaints 
10. Respiratory complaints 
11. Musculature involvement 
12. Weakness, dizziness 


8 Although subgroups were prepared on the basis 
of the direction of drawing the arc, and whether 
the projection was drawn first, the number of cases 
in each subgroup was too small at the time of the 
experiment to permit statistical analysis of the ap- 
parent differences between these subgroups. 


13. Fatigue 

14. Anorexia 

15. Weight loss (as presenting complaint, not 
merely as a physical finding) 

16. Excessive perspiration (as a presenting com- 
plaint) 

17. Dermatitis 

18. Buzzing or ringing in ears 

19. Other somatic complaints 

20. Depressed 

21. Suicidal thoughts or attempts 

22. Blackouts; amnesia 

23. Insomnia 

24. Tremulous 

25. Nervous, tense, restless, anxious 

26. Tense (tallied separately) 

27. Excessive drinking or addiction to drugs 

28. Incapacitated or adjudged incompetent 

29. Hallucinations or delusions 

30. Impotence; homosexuality 

31. Violent fears or dreams directed at patient 
(see text) 

32. Paranoid 

33. Seclusive 

34. Immature, dependent, inadequate, inferiority 
feelings, passive 

35. Impulsive, poor judgment 

36. Hostile, aggressive 

(a) readily expressed 
(b) poorly expressed 

37. Tearful 

38. Divorce, extreme marital or sexual problem 
(if evident in overt behavior) 


39. Parental deprivation or abnormality 
(death or separation of parents before age 12, 
or overt and severe problems such as being 
beaten regularly by a parent) 
40. Number of tallies in Somatic Complaint cate- 
gories plus items 34, 35, 36, 38 
(a) none (b) one (c) two (d) three or 
less (¢) six or more (f) seven or more 





Results 


Table 2 lists those categories for which the 
difference in percentages reached statistical sig- 
nificance at the .001, .01, .05, and .10 levels 
of confidence, the latter included only to in- 
dicate the general trends running through the 
data. 

No significant differences appear between 
the diagnoses of the two groups, neither in 
terms of specific diagnosis nor by broad psy- 
chotic-nonpsychotic categories. No differences 
are apparent in treatment received nor in the 
degree of insight attained. However, the con- 
trol group shows a greater percentage whose 
prognosis is described as “‘good.” 





4 
3 
3 
¥ 
d 





on eh 8 eM ea al 





ih tah a ae ee. 


Directionality of Lines in the Bender-Gestalt Test 215 


Table 2 
Items from Check List (see Table 1) 
Differentiating a Random Group of NP Patients 
from Patients Drawing Projection of B-G Fig. 5 
toward the Arc 


(N is 75 for each group) 














Item Controls Exp. Item Controls Exp. 
Jo Ye J Te 
Significant at .001 Level Significant at.01 Level 
5 (a) 32 5 6(e) 7 24 
6(d) 45 83 7 16 37 
6(c) 25 63 10 3 15 
6(d) 19 4 ii 31 52 
20 31 63 15 1 12 
34 25 5§$ 2 S 29 
36 25 33. 6 31 S. 
37 3 21 35 11 31 
41(d) 69 23.0 Ui‘ SS 17 36 
41(e) 5 37 40(a) 9 0 
41(f) 3 23. 40(b) 17 3 

Significant at .05 Level Significant at .10 Level 

3(a) 13 3 3(a)+3(b) 32 20 
5(d) 7 19 3(c)+3(d) 68 80 
5(f) 3 13 4(a) 11 4 
6(a) 08 85 5(b) 23 «(12 
14 9 2 Ste) 20 33 
19 21 37 8 23 36 
21 13 ee 16 27 
36 (a) 23 41 26 37—s «51 
40(c) 21 $8 28 146. 27 
36(b) ow 





Striking differences are noted in the com- 
plaints listed in the final discharge summary. 
The control group is more likely to have no 
somatic complaints at all, while the experi- 
mental group begins to show a significantly 
greater number as the somatic complaints reach 
three and above. Using cumulative totals, the 
experimental group is more likely to have one 
or more, two or more, three or more, four or 
more, and five or more somatic complaints. 

A difference appears not only in number but 
in the types of somatic complaints. A signifi- 
cantly greater proportion in the experimental 
group has headaches, respiratory ailments, 
complaints involving the musculature, severe 
weight loss, anorexia, and “other somatic com- 
plaints.” 

In other characteristics and complaints noted 
by the psychiatrists significant differences also 
occur. The patients who draw the diagonal of 
Figure 5 toward the half-circle are more often 
characterized as immature, passive, inadequate, 


or dependent, and more often exhibit hostile or 
aggressive behavior. It is interesting to note 
that the patients in the control group who ex- 
hibit hostility are more likely to be capable of 
direct expression. More evidence for the ap- 
parent inability of the experimental group to 
face directly the source of their frustrations is 
seen in the fact that they show a significantly 
greater incidence of blackouts or amnesia. The 
affect of this group also seems more seriously 
disturbed, as seen in the greater frequencies of 
depression, tearfulness, and suicidal thoughts 
or attempts. That frustrations may always 
have been present for these people is suggested 
by the finding that serious deficit or abnormali- 
ty in the home situation during formative years 
is mentioned with significantly greater fre- 
quency in the experimental group. 

Differences appear also in the fears and 
dreams which reach sufficient intensity as to be 
included in the discharge summary. These 
fears or dreams all seem to represent forces of 
a violent nature impinging upon the patient, 
as illustrated by the following typical state- 
ments: “people planned to shoot him,” “he felt 
there were guns pointing at him in the dark,” 
“dreams of sharks,” “soldiers coming at him,” 
“feeling of a gun being discharged into his 
abdomen,” etc. 

As a very crude measure of the over-all ex- 
tent of involvement, a summation was made 
of the number of complaints and the listed per- 
sonality traits. It was found that the control 
group was more likely to have zero, only one, 
only two, or ‘three or less such tallies, while 
the experimental group was much more likely 
to have six or more or seven or more tallies. 


Discussion 


It appears from the above experiment that 
differences in personality and even in certain 
personal-history items are reflected in the man- 
ner in which the subject draws the component 
parts of a Bender-Gestalt figure. It is especial- 
ly significant that this should be true even 
when a very specific behavior on a single B-G 
figure (independently of quality of the per- 
formance) is correlated with a criterion as 
brief and variable in its coverage of data as a 
discharge summary. 

It should be pointed out, however, that al- 
though statistically significant differences oc- 





216 


cur, many of these variables show a large 
enough incidence in both groups as to reduce 
the practical importance of such variables. It 
may be more useful, therefore, to consider a 
syndrome or combination of attributes by 
which to differentiate various groups. On the 
basis of this experiment such a behavior pat- 
tern might be postulated for patients drawing 
Figure 5 as described. In general, these pa- 
tients appear immature and dependent, to have 
somewhat less capacity for adjustment than 
their ward-mates. They are prone to feel over- 
whelmed by their problems and to be easily 
frustrated. Direct attack upon the sources of 
this frustration seems too threatening; other 
less adaptive mechanisms are resorted to, such 
as blackouts, amnesia, and the many self- 
punishing somatic reactions. The frustrations 
experienced as a result of their inadequacy may 
lead to hostile and aggressive expression, which 
is more likely to be indirect or poorly con- 
trolled. The ineffectiveness may appear insur- 
mountable so that there occurs a greater fre- 
quency of depression, tearfulness, and even 
suicide. Mention has already been made of the 
greater percentage of “good” prognoses in the 
control group. Though significant only at the 
10 level, it is noted that there is a greater 
tendency for the patients within this syndrome 
to be completely incapacitated or to have been 
judged incompetent by a court. As pointed 
out earlier, the larger proportion of grossly un- 
favorable factors in the home environment is 
suggestive in the consideration of etiological 
elements. 

For those clinicians interested in the sym- 
bolic value of the B-G figures, the statement 
might be made that the direction in which the 
projection in Figure 5 is drawn may represent 
the impinging forces within the patient’s en- 
vironment, which are perceived as overwhelm- 
ing and punishing, as contrasted with other pa- 
tients whose attack is directed outwardly (the 
same direction in which they draw the pro- 
jection on Figure 5) against their problems. 

What defenses are utilized in the individual 
case cannot be determined by knowledge of 
this one relatively minor aspect of B-G be- 
havior but are dependent upon other findings 
on the test. It is hoped that research now in 





Roland M. Peek 


progress using a configural approach with ob- 
jectively scored factors may increase the use- 
fulness of the B-G as a clinical and experimen- 
tal instrument. This experiment seems to in- 
dicate that the order in which lines are drawn 
and the directionality of such lines should be 
included in B-G scoring systems. 


Summary 


Comparison was made between 75 random- 
ly selected neuropsychiatric patients and 75 pa- 
tients from the same population who were 
known to have drawn the diagonal projection 
on Figure 5 of the B-G toward the inverted 
half-circle of dots, rather than in the popular 
manner, i.e., outwardly from the half-circle. 
Overt behavior and traits of the patient and 
certain personal-history items as described in 
the discharge summary were the variables com- 
pared in the two groups. The significance of 
the differences of percentages of occurrence of 
each attribute were tested using phi coefficients. 
Twenty-seven variables were found to have 
differences significant beyond the .01 level of 
confidence, while nine variables were signifi- 
cant at the .05 level. On the basis of these 
findings it was suggested that the order and 
directionality used in drawing parts of the B- 
G figures be included in data derived by test- 
ing with that instrument. 


Received September 22, 1952. 


References 


1. Bender, L. A. A visual motor Gestalt test and 
its clinical use. Amer. Orthopsychiat. Ass. Res. 
Monogr., 1938, No. 3. 

2. Billingslea, F. Y. The Bender-Gestalt: an ob- 
jective scoring method and validating data. J. 
clin. Psychol., 1948, 4, 1-27. 

3. Hutt, M, A. A tentative guide for the admin- 
istration and interpretation of the Bender-Ges- 
talt test. U. S. Army Adjutant General’s School, 
1945 (restricted). 

4. Jurgensen, C. E. Table for determining phi- 
coefficients. Psychometrika, 1947, 12, 17-29. 

5. Pascal, G. R, & Suttell, Barbara J. The Ben- 
der-Gestalt test. New York: Grune & Stratton, 
1951. 

6. Peek, R. M., & Quast, W. A scoring system for 
the Bender-Gestalt test. Hastings, Minn. (Box 
292) and Minneapolis, Minn. (2810 42nd St.) : 
Authors, 1951. 





0! leat ae Bot ee 


Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


Test-Retest Stability of MMPI Scales for a Psy- 
chiatric Population” 


Albert Rosen 


Veterans Administration Hospital, Minneapolis, Minnesota 


The study of reliability of personality tests 
involves some problems which are not present 
in the fields of intelligence and achievement 
testing. Reliability defined as accuracy of 
measurement or in terms of a low standard 
error of measurement cannot be evaluated ade- 
quately for personality tests. Internal consis- 
tency measures of reliability are not appropri- 
ate for scales such as those in the MMPI [1]. 
Some of these scales demonstrate considerable 
inhomogeneity, for they were not constructed 
with the intention of achieving internal consis- 
tency. Thus, although the test-retest method 
is more suitable for assessing reliability, the 
temporal changes in test-retest data make it 
more desirable to refer to “stability” rather 
than “reliability.” 


There has been some question as to whether 
a high degree of reliability is desirable in per- 
sonality tests [3]. According to this latter ar- 
gument, the sensitivity of a personality scale to 
changes in an individual, say, through psycho- 
therapy, is much more important than reliabil- 
ity. In intelligence and achievement testing, 
on the other hand, the ideal test is one which 
gives consistent results, regardless of extran- 
eous factors. 


Perhaps the situation might be clarified if 
one were to think in terms of both stability and 
sensitivity. Stability would be measured by the 
consistency of test results over a period of days. 


1Reviewed in the Veterans Administration and 
published with the approval of the Chief Medical 
Director. The statements and conclusions published 
by the author are a result of his own study and do 
not necessarily reflect the opinion or policy of the 
Veterans Administration. 


“Read at the annual meeting of the American 
Psychological Association, Division of Clinical and 
a Psychology, Washington, D. C., Septem- 

r, 1952, 


217 


Sensitivity would be evaluated by the lack of 
consistency of scovtes after a period of time, 
following some change which had occurred in 
the individual as a result of psychiatric therapy 
or gross environmental alteration. Both of 
these conditions are desirable features of a 
personality scale. A score for an individual on 
any scale at a given time should be within a 
reasonable distance of his “true” score. On the 
other hand, the scale should have the capacity 
for reflecting modifications in personality with 
the advent of radical change in environmental 
status. 


The purposes of the present study were (a) 
to measure the stability of the individual 
MMPI scales for a psychiatric population over 
a brief test-retest interval by means of product- 
moment correlation coefficients, and (+) to 
evaluate what changes occur in patients’ 
MMPI scores as a result of hospitalization 
and participation in the hospital regime for a 
few days before they have received much, if 
any, therapy of a formal nature. 


Selection of Subjects 


Every new, testable patient who was admit- 
ted to the psychiatric section of the Minne- 
apolis VA Hospital during an eight-week peri- 
od was given the MMPI. All patients are 
males. Retesting was done within two to seven 
days until forty test-retest cases were obtained. 
It was decided beforehand to exclude from the 
sample the following cases: 


1. Patients retested more than seven days after 
the initial test. The seven-day limit was arbitrarily 
set in order to reduce the effects of psychotherapy 
on scores. 

2. Former patients. Each of these patients had 
had one or more tests during one or more previous 
admissions. It was anticipated that sufficient re- 





218 Albert Rosen 


sistance to retesting would be encountered in these 
patients to provide an inadequate measure of sta- 
bility. 

3. Soldiers. Men in active service, when admitied 
to the hospital, often possess strong motivation to 
take the test in such a way as to give a very poor 
psychiatric picture, with the hope of obtaining dis- 
charge from the service. 

4. Patients given the group form of the MMPI 
as the initial test. It was decided to restrict the 
study to patients who had taken the individual 
(card) form because of any differences which might 
occur as the result of the patient taking two some- 
what different tests. 


Of the 89 patients admitted to the hospital 
during the eight-week period, the records of 
49 cases could not be used in the study because 
of the four reasons described above, or be- 
cause a test and retest were not available. The 
reasons that these 49 cases were excluded are 
classified below, with the number of cases indi- 
cated for each category. 

1. Initial test unavailable (12): (a) In Psychi- 
atry too brief a period to be tested (5), (b) Too 
disturbed to take test (5), (c) Received electroshock 
therapy almost immediately (2). 

2. Retest MMPI unavailable (37): (a) Former 
patients (14), (b) Initial MMPI given more than 
one week prior to admission to Psychiatry (6), (c) 
Not retested within a week of the initial test (4), 
(d@) Group form administered as initial test (3), 
(¢) Soldier (2), (f) Electroshock therapy given 
before retest (2), (g) Refused retest (2), (h) Too 
disturbed for retest (2), (i) Left Psychiatry before 
retest (2). 


Procedure for Administration of Tests 
and Retests 


Each patient was individually administered 
the initial test within a few days after admis- 
sion by the psychologist on the psychiatric 
team to which the patient was assigned. Thus, 
only a few cases were given the initial test by 
the writer. Retesting was done within two to 
seven days by the writer. In most cases the 
retest was given individually, but in approxi- 
mately eight cases the patients were seen in 
pairs, and one group of five patients was tested 
at a time. 

Each patient was given the same rationale 
for the retesting, although no attempt was 
made to word it exactly in the same way. Each 
patient was told, in effect, that the retest 
would be of value to the staff and the patient 
in affording a dual base line for comparison 
with any later psychological evaluations which 


might be desired for assessing the patient’s 
status. However, the patient was told to an- 
swer the items as he felt at the time. The latter 
instruction was designed to counteract any be- 
lief on the patient’s part that he might be ex- 
pected to duplicate his initial test responses. 


Table 1 


Test-Retest Time Intervals for the 40 Patients in 
the Stability Study 








Number of Days from— 





First 
Admission to test to Admission to 
first test second test second test 
Q, 1.3 3.1 5.5 
Md 2.8 3.9 7.0 
Q. 5.2 4.8 9.8 
M 3.6 4.1 7.7 
Range 0-12 2-7 2-19 





In Table 1 are shown the summary statistics 
on the number of days from (a) date of hos- 
pital admission to initial test, (4) initial test 
to retest, and (c) hospital admission to retest. 
The Christmas vacation period intervened dur- 
ing this study, so that some of the patients 
were seen for initial testing later than would 
normally occur. Four patients took the first 
test more than seven days after admission. On 
the average, however, patients were tested 
within three days of admission and retested 
within four days. No patient was retested 
within forty-eight hours following the initial 
test in order to avoid contaminating the re- 
sults with a memory factor. 


Description of Patients in the Stability Study 

According to primary psychiatric diagnosis, 
the sample was comprised as shown in Table 
2. Data were also recorded on each patient 
for age, occupational level, years of formal 
education, marital status, and intelligence. 
Medians for the group are as follows: Age— 
32.4 years; education—11.2 years; occupa- 
tional level—4.4.° Of the total group, 32.5% 
had never been married, and 53.8% possessed 
an equivalent Wechsler-Bellevue IQ of 110 or 
higher, as determined from Shipley-Hartford 
test results. The stability sample demonstrated 
no significant difference on any of these vari- 

’The occupational rating scale devised by War- 


ner, Meeker, and Eells [4] classifies occupations in 
levels 1 (the highest) through 7. 





: 
+ 





Matin in am 


Test-Retest Stability of the MMPI Scales 219 


Table 2 
Primary Diagnoses of the Patients in. the MMPI 
Test—Retest Sample 











Number Per cent 
Diagnosis in of 
Sample Sample 
Anxiety reaction 12 30.0 
Conversion reaction 3 7.5 
Depressive reaction 9 22.5 
Somatization reaction 1 2.5 
Neurosis total 25 62.5 
Paranoid schizophrenia 5 12.5 
Schizophrenia, unclassified 3 7.5 
Paranoid state 2 5.0 
Involutional melancholia 1 2.5 
Psychosis total 11 27.5 
Inadequate personality 1 2.5 
Alcoholism 2 5.0 
Convulsive disorder 1 2.5 
Other diagnosis total = 10.0 
All cases 40 100.0 





ables, nor on diagnosis, when compared with a 
sample of 250 patients-in-general. This latter 
group represents a cross section of all patients 
at the same hospital over a two-year period 
who had taken the MMPI. Therefore, results 
obtained in this study are most likely repre- 
sentative of the patient population. 


Test-Retest Stability Coefficients 


For each of the scales, a product-moment 
correlation coefficient was computed from the 
test and retest records of the 40 patients in 
the stability sample. There has been only one 
other study, that of Holzberg and Alessi [2], 
in which the stability of the MMPI scales was 
determined for a psychiatric group over a brief 
test-retest period. However, that investigation 
differed from the present one in the following 
ways: 


1. One-half of their total group of 30 cases re- 
ceived the complete test of 550 items as the initial 
test, and were retested within one or two days with 
only the 350 items which are scored in the original 
scales. 

2. The other 15 cases received the long and 
short form in reverse order within the same time 
interval. 

3. Exactly two-thirds of their cases were psy- 
chotics as compared to only 27.5% in the present 
study. 

4. Their sample apparently consisted of state hos- 
pital psychiatric patients of both sexes. 


The product-moment stability coefficients of 
the two studies are shown in Table 3. The 
coefficients from the two studies generally cor- 
respond except on L, F, Hs, and Pd. Perhaps 
the coefficients for L and F are lower in the 
present study because of the reduced effect of 
the memory factor over the longer test-retest 
interval. Any other divergence in the results 
for the two studies may be due to the differ- 
ences in diagnostic composition of the two 
samples. ‘The MMPI scale scores of the Holz- 
berg and Alessi sample of state hospital pa- 


Table 3 
Test-Retest Stability Coefficients for MMPI Scales 











Present Holzberg & 
Scale Study Alessi [2] 
(N = 40) (N = 30) 
L 62 85 
F 81 
K 65 
Hs 85 67 
Hs + 5K 86 
D 80 .80 
Hy 88 87 
Pd 88 52 
Pd+ AK 87 
Mf 64 76 
Pa 75 .78 
Pt .80 72 
Pt+10K 88 
S¢ 83 89 
Sc+10K 86 
Ma 56 58 
Ma+ 2K 55 
Si 83 





Note—Coefficients of .81 and .36 are significantly dif- 
ferent from zero at the .05 level, for N of 40 and 30, 
respectively. 


tients are all considerably lower than those 
found for the veterans in the present study. 
The latter are no doubt suffering from com- 
paratively acute rather than chronic mental 
disorders. 


In the present study, the coefficients for L 
and F no doubt underestimate the degree of 
relationship between the test and retest rec- 
ords, because of the positive skewness in the 
L and F distributions of scores. On the clinical 
scales, the correlation coefficients range from 
.55 to .88. All the correlations of the clinical 
scales in common use are equal to or above .80, 


except Pa and Ma + .2K. The MMPI scales, 





i 
4 
} 
; 





Albert Rosen 















































Hs Pd P+ Se Ma 
et ie +5K D Hy +4K Me Pa +1K +1K +.2K Sj 
120 
110 
—e First Test 
100 
Owe---0 Second Test 
90 
2 
80 
° 
15) 7|~ <> 
” 70 => Ya mS 
- ‘S Ya on 
S =o 
60 = - _ art 
\ 
50 - 
40 
30 





















































Fig. 1. Mean MMPI T-score profiles for the first and second tests (N — 40). 


in general, demonstrate a fairly high degree of 
stability in an absolute sense, and a high de- 
gree of stability relative to what one would 
anticipate in personality scales. 

It is important to consider the factors which 
may have influenced the degree of consistency 
obtained in this study. The following condi- 
tions of administration were not uniform for 
initial test and retest, and would tend to re- 
duce stability. 


1. Variations in motivation from test to retest. 
Some of the patients took the second test somewhat 
reluctantly because they were not strongly convinced 
that it would be of any additional value. 

2. Individual administration of first test and 
group administration of retest for some cases. 

3. Administration of test and retest by different 
examiners. 


On the other hand, the second test was given 
within two to seven days of the first, so that 
memory for some of the items may have con- 
tributed in the direction of enhancement of 
stability. It is difficult to determine how much 
weight can be assigned to each of these condi- 
tions. The first is by far the most important 
factor. Probably the total effect was to some- 
what underestimate stability. 


Changes in Means from Test to Retest 
Clinicians often speculate about what chang- 


es occur in patients merely as a result of hos- 
pitalization and participation in the hospital 
regime for a few days before they have received 
much, if any, formal therapy. The data from 
this sample provide this information, as it is 
measured by the MMPI. Table 4 presents the 


Table 4 


Differences in Raw Score Means on MMPI Scales 
from Test to Retest (N — 40) 











Standard 
Mean Deviation 
Scale First Differ- First 

Test Retest ence Test Retest 
L 4.05 4.35 30 2.17 2.09 
F 7.78 748 -30 5.12 4.75 
K 13.08 14.58 1.50% 4.08 4.69 
Hs 13.00 11.90  -1.10 6.25 7.26 
Hs + 5K 19.82 19.45 -—37 6.25 6.90 
D 29.20 29.55 35 6.27 6.66 
Hy 28.50 27.62 -—88 6.38 7.71 
Pd 21.50 20.45 =1.05" 5.49 5.50 
Pd+AK 26.65 26.28 -37 486 4,97 
Mf 25.48 24.75 -73 420 4.21 
Pa 13.00 11.92 -1.08% 4.39 4.04 
Pt 23.12 _21.02 -2.10% 9.45 9.57 
Pt+10K 36.20 35.50 -70 7.98 $8.05 
Sc 2140 18.95 -2.45* 10.94 9.90 
Sc +10K 34.48 3343 -105 949 8.49 
Ma 17.88 18.35 AT 4.30 5.28 
Ma+ 2K 20.52 21.25 73 3.87 4.90 
Si 34.48 33.25 1.23 11.00 11.08 





*Difference significant at the .05 level. 





ee ee ae 





ee 


a 


Test-Retest Stability of the MMPI Scales 221 


differences in raw score means for the 40 pa- 
tients in the stability study who were, on the 
average, tested within three days of admission 
to the hospital and retested four days after the 
first test. Figure 1 portrays the test and retest 
mean T-score profiles based on the raw scores 
in Table 4. 

There are five significant differences in 
means from the first test to the retest. The in- 
crease in K score and the decrease in Pd, Pa, 
Pt, and Sc point to a movement in the “nor- 
mal” direction with an improvement in de- 
fensive structure. After a few days of hospi- 
talization, there is a general over-all decrease 
in MMPI scores for the group as a whole. 


Summary 


1. A group of 40 male, veteran, psychiatric 
hospital patients were administered the indi- 
vidual form of the MMPI on an average of 
three days after admission to the hospital and 
again on an average of four days after the ini- 
tial test. 

2. For clinical scales in common use, test- 


retest stability coefficients are almost all be- 
tween .80 and .88. 

3. There is an over-all tendency for reduc- 
tion in MMPI scale scores and increase in de- 
fensiveness during the first few days of hospi- 
talization before much, if any, formal therapy 
has been initiated. 


Received September 22, 1952. 


References 


1. Hathaway, S. R. & McKinley, J. C. Manual 
for the Minnesota Multiphasic Personality In- 
ventory. (Rev. Ed.) New York: Psychological 
Corp., 1951. 

2. Holzberg, J. D., & Alessi, S. Reliability of the 
shortened MMPI. J. consult. Psychol., 1949, 13, 
288-292. 

3. Horn, D. Intra-individual variability in the 
study of personality. J. clin. Psychol., 1950, 6, 
43-47. 

4. Warner, W. L., Meeker, M., & Eells, K. Social 
class in America: A manual of procedure for 
the measurement of social status. Chicago: So- 
cial Science Research A-sociates, 1949. 





Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


Perceptual Learning and Age 


Bernard Hanes * 


Ohio State Penitentiary 


Gilbert [1] demonstrated that relationships 
between learning and age are probably a func- 
tion of the method of presentation and meas- 
urement. After matching old and young in- 
dividuals on a vocabulary test he concluded 
that the least amount of deficit occurs in the 
immediate recall of brief new material; the 
greatest deficit occurs in the recall, after brief 
exposure, of more complex material. 


Gilbert’s study also demonstrated that mem- 
ory involving the formation of new associa- 
tions was extremely difficult for senescents, 
thus pointing to a relative intellectual inflexi- 
bility for this age group. 

Halstead [2] approximated the same con- 
clusions in his study of seniles aged 69 through 
83. These old people were best at rote mem- 
ory, immediate visual recognition, and doing 
tasks involving early acquired habit patterns. 
Remembering or learning involving the re- 
versal of old habits was considered to be poor. 


Ruch’s [3] work assumes that later maturi- 
ty should bring a differential set of changes in 
rate of learning. He discovered that older in- 
dividuals were relatively more handicapped if 
the new learning required the attachment of 
an antagonistic response to a previously learned 
S-R connection than if the response was con- 
sonant with past experience. Relatively mean- 
ingless responses stood intermediate in difficulty 
of learning between the consonant and the dis- 
sonant material. His measure of the effects 
of the type of material was in terms of ability 
to form S-R connections. He used, of course, 
a critical score as his measure of performance. 

It seems entirely possible that this learning 
deficiency is in part due to errors in perception 
of the material when it is presented. This 
deficiency might be more marked for the dis- 
sonant than for the consonant material and, 


1 Now in Dallas, Texas. 


in turn, more marked for the older than for 
the younger groups. 


Exposure Materials Used 


In order to obtain a clearer view of this 
problem, a task consisting of tachistoscopically 
presented material was chosen for its relation- 
ship to the past experience of the subjects. The 
task consisted of three types of material. 


Type A: This is considered as highly stable, un- 
ambiguous material, essentially agreeing with past 
experience, “true material,” as follows: 10 X 7 = 
70,28 — 7== 162 KX 12 = BEX 8 Se Se, 
6X 12:c= 73:33 — 252 61, 12 X Ste Se 1X 
9 — 99,15 — § = 10, and 21 — 2— 19. 

Type B: This material is considered as nonstable, 
requiring a reversal of past habit associations, and 
is essentially contrary to past experience, “false ma- 
terial,” as follows: 20 X 4 = 10, 9 X 12 = 90, 
18 — 9—=— 11,9 X 14= 36,20 — 5 = 19, 33 — 
7 = 2%, 12 X 0== 12,4 X 12-4, 32 — 32 = 
1, and 55 — 5 = 49. 

Type C: This material is considered lowgrade, 
rather nonsensical information, requiring new as- 
sociations, and is essentially nonsense, as follows: 
4:31.32 :17==9,5 :1=@€2,19 :9o=— 
7, 22 : 2% 7, 8 : Si = $8, 28 : 5 = 17, $B 3.19 
= 6, 54:12 85, end D : 6 = SS. 


Hypotheses Tested 


The hypotheses tested were: (a) Highly 
stable, unambiguous information should be 
relatively intact in later maturity. (4) Non- 
sense material which calls for new associations 
should show an intermediate decline in old 
age. (c) Information contrary to past experi- 
ence should show the greatest decline in later 
maturity. 


Description of the Subjects 


One hundred and eighty white, recent ar- 
rivals at the Ohio State Penitentiary were each 
individually told that they composed a part 
of a special group of high-caliber men, and 


222 





oe a alt Re lS Ab et Ae 





hats ln Bae > om 


ee ee 








that we wished to compare them on a test with 
other inmates. These subjects were not psy- 
chotic and had normal scores on the Minnesota 
Multiphasic Personality Inventory. Their 1Q 
scores on the revised Beta examination ranged 
from 91 to 123. The subjects all agreed they 
would do their best on the test; in fact, they 
were quite anxious to get started. 

Although it is possible to state theoretically 
the criteria of functional old age [4], exact 
means of measuring these criteria have not, as 
yet, been developed. Therefore, in this study, 
it was deemed appropriate to use chronological 
age as the criterion of the aging process. Ac- 
cordingly, the following terms were used in 
this study: (1) “Young Group” was used to 
designate the age period from 20 to 34 years 
of age. (II) “Middle Maturity Group” was 
used to designate the age period from 35 
through 49 years of age, and (III) “Later 
Maturity” or “Old Group” was used to desig- 
nate the age period from 50 to 70 years of age. 
Each of these groups was composed of 60 in- 
dividuals of approximately average intelligence 
and of normal visual acuity. 


Experimental Design 


Tachistoscopic exposures of thirty 3” X 5” 
white index cards containing the various types 
of information discussed above were randomly 
presented. The exposure time was 1/10 sec- 
ond. ‘The number of correct perceptions con- 
stituted the score. 

Sixty subjects were assigned to each of the 
various groups as described above. Group I 
was utilized as a control group, while Groups 
Il and III were utilized as experimental 
groups. The subjects were individually called 
into the testing room, and seated in front of 
the Renshow tachistoscope. The practice trials 
consisted of tachistoscopic exposure of one of 
each of the various kinds of material. The 
following instructions were given to each sub- 
ject. 


Here is a task in which you look into the instru- 
ment you see in front of you, like this [demonstrate]. 
Here is some idea of what to expect. You may see 
something like this, [show one of the practice 
cards], or you may see something like this [show 
another practice card]. If you aren’t quite sure of 
what you see, go ahead and guess. Before running 
through the regular test cards, I will let you have 
three practice trials. [After each practice card had 





Perceptual Learning and Age 





223 


been “flashed” the subject was shown exactly what 
information was contained on the card.|] You have 
now finished the practice test. Do your best, and tell 
me exactly what you see. 


Statistical Analysis of Results 
The results were submitted to an analysis 


Table 1 


Analysis of Variances 














Source of Mean 
variation df square F 
Group I 
Between types of 
material 2 389.73 125.31 
Between people 59 7.92 2.55 
Residual (error) 118 3.17 
Group II 
Between types of 
material 2 441.76 158.34 
Between people 59 6.96 2.49 
Residual (error) 118 2.79 
Group III 
Between types of 
material 2 408.10 124.04 
Between people 59 8.03 2.44 
Residual (error) 118 3.29 





of variance. All the F ratios in Table 1 are 
significant. However, in order to determine 
where these significant differences lie, the 
means, standard deviations, and ¢ ratios of the 
various data were computed, and are shown 
in Table 2. The means of the various groups 
for each kind of material are shown graphically 
in Figure 1. 











Table 2 
Means, SD and ¢ Ratios for the Various 
Materials 
Group Differ- t 
number M SD ences ratios 
III 
A 7.00 2.08 A-B 10.74 
B 2.87 2.19 A-C 13.90 
i 2.34 2.19 B-C 3.76 
Il 
A 8.17 1.76 A-B 10.40 
B 4.40 2.00 A-C 11.71 
S 2.90 2.28 B-C 1.28* 
I 
A 8.52 1.64 A-B 27.94 
B 4.58 2.34 A-C 32.08 
it 3.58 2.49 B-C 2.23 





* Nonsignificant at 5% level of confidence. 





224 Bernard Hanes 





9 
R 
8 
¥:9.42-.76 
a 
.*) 
q 6 
a * B 
X ¢ 
5 *5.67-.82® 
~ 3 
S ¥: 4/8 -.62X 
2 2 
/ 
oF Ir ig 


AGE GROUPS 


Fig. 1. Comparison of the three types of mater- 
ials in each age group. 


Discussion 


The analysis of variances reveals a real dif- 
ference between types of material exposed and 
between individuals within each age group. 

An interaction is probably demonstrated by 
the nonsignificant ¢ ratio of 1.32 on material 
C, between Groups III and II. Perhaps this 
interaction pertains to a relaxation process tak- 
ing effect in the ages after 35. 

Judging from Figure 1, a deficit following 
a definite straight line from young to old oc- 
curs in material C. The dotted lines, con- 
structed by the least-square method, show the 
deficit that should occur on materials B and C. 
The line for material C is, of course, identical 
with the least-square line of best fit. 

The amount of retardation in deficit in ma- 
terials 4 and B is theoretically demonstrated 
by the vertical space between the dotted and 
solid lines. Evidently, the largest deviation of 
materials 4 and B from the general trend oc- 
curs in the middle age group and grows im- 
preceptibly less as the extreme ages are ap- 
proached. 

No particular relationship is demonstrated 


between kinds of material and age. The slopes 
of the various regression lines are practically 
identical and slight variations from parallel 
are treated as error. Thus, the study shows 
a relationship between performance and age, 
with all three groups experiencing greater dif- 
ficulty in learning the nonsense material. Ap- 
parently then, materials requiring the reversal 
of old habits show a substantially greater deficit 
in the aged, but the greatest deficit is mani- 
fested in materials calling for the learning of 
new associations. Hypotheses (4) and (c) are 
untenable. 


Table 3 


t Scores between Groups on Various Conditions 








t ratios under Condition 





Groups A B Cc 
III-II 3.26 3.92 1.32° 
IlI-I 4.27 4.16 2.82 
II -I 1.11* A4* 1.56* 





* Nonsignificant at 5% level of confidence. 


Conclusions 

Decline appears to be most apparent in 
learning requiring new associations, rather 
than learning which demands a reorganization 
of previously formed habit patterns. No evi- 
dence for any relationship between kinds of 
material and decline was found; but, rather a 
consistent relationship between age and per- 
formance was discovered. The younger per- 
son performs better, but experiences approxi- 
mately the same difficulty with the various 
types of materials as the older person. 


Received October 8, 1952 


References 


i. Gilbert, Jeanne G. Mental efficiency in senes- 
cence. Arch, Psychol., N.Y., 1935, No. 188. 

2. Halstead, H. A psychometric study of senility. 
J. ment. Sci., 1943, 89, 363-373. 

3. Ruch, F. The differential effects of age upon 
human learning. J. gen. Psychol., 1934, 11, 
261-286. 

4. Stieglitz, E. J. Geriatric medicine. Philadel- 
phia: Saunders, 1943. 











Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


Wechsler-Bellevue Split-Half Subtest Reliabilities: 
Differences in Age and Mental Status 


Jack Botwinick 
Section on Gerontology, National Heart Institute 
National Institutes of Health, PHS, FSA, Bethesda, Maryland 
and the Baltimore City Hospitals 
Baltimore, Maryland 


Considering the extent to which the Wech- 
sler-Bellevue test [13] is used and the various 
groups to which the test is administered, the 
number of studies investigating the reliability 
of the test is limited [10]. Derner, Aborn, 
and Canter [2], and Webb and De Haan 
[12] have examined “normal’’ subjects, while 
others [4, 5, 8, 9] have utilized psychiatric 
subjects. Webb and De Haan, who investi- 
gated subtest reliability in a paranoid schizo- 
phrenic group as well as in a “normal” one, 
and Gilhooly [4], who investigated the re- 
liabii:‘ty of only 4 verbal subtests, computed 
coefficients of reliability by the split-half meth- 
od. The other investigators computed reli- 
ability by the retest method. 


Rabin and Guertin [10] reported that the 
studies utilizing psychiatric subjects are too 
inconclusive because the retest reliabilities were 
confounded with clinical changes and with 
practice effects. It was for this reason, plus 
the fact that the Wechsler-Bellevue test is 
usually administered in one test period, that 
Webb and De Haan believed it important to 
compute reliability by the split-half method. 

Webb and De Haan [12] reported higher 
split-half reliability coefficients in the perform- 
ance subtests for their paranoid schizophrenic 
group than for their “normal” group, although 
only one difference between the coefficients 
was significant at the 1 per cent level and one 
at the 5 per cent level. Helmick [6] pointed 
out that since the magnitudes of reliability co- 


1 The criticism and assistance of N. W. Shock in 
the preparation of this report is gratefully acknow- 
ledged. Mrs. E. Benser gave valuable aid in the 
computational work. 


225 


efficients are dependent upon the variability in 
the groups, the conclusions reached by Webb 
and De Haan were unwarranted. He recalcu- 
lated the data of Webb and De Haan and 
found that when the respective variabilities 
were allowed for, the magnitudes of the reli- 
ability coefficients were about the same for the 
paranoid schizophrenic and the “normal” 
groups. 

The present study was designed to extend 
our knowledge of Wechsler-Bellevue split-half 
reliabilities. 

Procedure 


Comparison was made between the subtest 
reliabilities computed by Webb and De Haan 
for a “normal” group [12] with the subtest 
reliabilities of a “normal” but older group. In 
addition, comparison was made between sub- 
test reliabilities of the older “normal” group 
and those of a matched group comprised of 
hospitalized patients diagnosed as “senile psy- 
chosis” and “psychosis with cerebral arterio- 
sclerosis.” 


Subjects of the older “normal” group were 
originally examined for the purpose of another 
study by Fox and Birren [3]. The mean age 
of this sample population was 64.3 years, o = 
3.19, and the mean age of the sample popula- 
tion of Webb and De Haan [12] was 37.6 
years, o = 11.20. The former group will 
henceforth be referred to as the “old control” 
and the latter as the “young control.” The 
mean IQ of the old control group was 100.8, 
o = 11.4, and of the young control, mean IQ 
= 97.8, « = 11.07. Both groups consisted of 
50 white subjects who were apparently free 











226 


of aberrant mental conditions. The only de- 
terminable difference between these two 
groups, in addition to the above, was a sex 
difference. The young control group consisted 
of women whereas the old control included 
both men and women. The hospitalized group 
that was compared with the old control group 
was originally examined for the purpose of a 
validity study [1]. It was indicated that the 
old control group and the hospitalized group 
were comparable except for mental status. The 
hospitalized group will be referred to as the 
“senile group.” It included 31 male and fe- 
male subjects. 

The method of Webb and De Haan was 
used for estimating the reliability coefficients. 
The scores on odd versus even items of the 
subtests were correlated and reliability was 
estimated by the Spearman-Brown formula 
for double length of test. The reliability of 
the Digit Span subtest was determined by cor- 
relating Digits Forward and Digits Backward 
instead of correlating odd versus even items. 
The reliability coefficient of the Object As- 
sembly subtest was estimated by intercorre- 
lating the three items, transforming the inter- 
correlations to z values, and retransforming 
the mean z value to the correlation coefficient. 
The reliability coefficient was estimated by the 
Spearman-Brown formula for triple length. 
The reliability coefficient for the Digit Symbol 
subtest was not determined since the total sub- 
test is measured by time and data were not 
available for individually timed items. 

Mean differences in coefficients of -eliability 
between old and young control groups and be- 
tween old control and senile groups were de- 
termined by z-value transformations. 

To allow for the differences in variability 
between the groups, the method suggested by 
Helmick [6] was used. That is, the standard 
error of measurement [7, p. 130] was com- 
puted for each subtest in the groups and F 
tests were applied to the standard errors of 
measurments between groups. In addition, the 
coefficients of reliability were corrected for dif- 
ferences in range [7, p. 134] and the corrected 
reliabilities were compared by z-value trans- 
formations. 

Results 

The corrected subtest reliability coefficients 

of the old control and senile groups are present- 





Jack Botwinick 


ed in Table 1 (columns marked r). One reli- 
ability, that of the Comprehension subtest of 
the old control group (r = 0.35) is so low as 
not to be significantly different from zero (p 


Table 1 


Age Differences in Reliability and Standard Error 
of Measurement of Subtests of the 
Wechsler-Bellevue Scale 














A B Cc 
Subtests Young control* Old control Senile 
N = 50 N = 50 N = 31 
ao meas. r o meas. r go meas. 
Vocabularyt 94 1.37 .96 148 .94 = 1.69 
Arithmetic See. oe BRS ie OSS 
Information 82 1.58 .90 1.21 .93 1.59 
Block Design .76 2.90 .84 2.51 .89 2.14 
Similarities 74 1.80 89 1.47 83 1.43 
Comprehension .53 1.96 .35 2.22 .69 1.89 
Object 
Assemblyt 46 1.90 .62 2.50 .66 2.82 
Digit Span§ M4 145 16 107 St 1 
Picture 
Completion BM. és 8 AD AR. 
Picture 
Arrangement .29 2.23 .55 1.57 .87 1.18 





* From data of Webb and De Haan [12]. 

+ N= 36 for Column A and N = 34 for Column B of the 
Vocabulary subtest. 

t N = 87 for Column A of the Object Assembly subtest. 
Object Assembly reliabiiity coefficients were estimated by 
the average intercorrelations of the items. 

§ N = 49 for Column A of the Digit Span subtest. 


< .01). Three coefficients prior to correction 
by the Spearman-Brown prophecy formula 
were also not statistically greater than zero (p 
< .01): the Comprehension and Object As- 
sembly subtests of the old control group, and 
the Digit Span subtest of the senile group. 
Webb and De Haan [12] reported that the 
lowest four uncorrected coefficients for the 
young control sample population were not sig- 
nificantly above zero at » < .01, and that the 
corrected coefficient for the Picture Arrange- 
ment subtest was below this level. 


When subtest reliability coefficients were 
compared between control groups, i.e., between 
young and old control, only the difference in 
reliability of the Picture Completion subtest 
was significant at the 1 per cent confidence 
level. The remaining subtest reliabilities were 
not significantly different at the .01 level al- 
though the reliability coefficients of the Digit 
Span subtest and the Similarities subtest were 
significantly different at » < .05. Significant 


feb 2k a a AE 1 4 ted 











| 











eB aE Sani 


en Ce ee eee 





differences in reliability coefficients between old 
control and senile groups were seen in the 
Arithmetic and Picture Arrangement subtests 
(p < .01). The difference in reliability of the 
Comprehension subtest was significant at p < 
.05 but no significant differences in reliabilities 
were indicated even at this confidence level for 
the other subtests. 

The differences between the old control and 
senile groups with respect to the standard 
errors of measurements presented in columns 
marked omeas in Table 1 were not statistically 
significant. Age differences in the standard 
errors of measurements were significant for the 
Picture Arrangement subtest at the 1 per cent 
level and for the Digit Span subtest at the 5 
per cent level.” 

When the reliability coefficients were cor- 
rected for range and compared by z-value 
transformations, a difference between the two 
elderly groups was found at the 5 per cent 
level for only the Information subtest and no 
statistically significant differences, even at this 
confidence level, were found for the remain- 
ing subtests. When age comparisons were 
made for the reliability coefficients corrected 
for range, only the Object Assembly and Pic- 
ture Completion subtests were found to be sig- 
nificant. The confidence levels were 5 and 1 
per cent respectively when the range of the old 
control was corrected to that of the young 
control, and 1 and 5 per cent respectively when 
the range of the young control was corrected 
to that of the old control. 


Discussion 


The results of this investigation have in- 
dicated that the range of the coefficients of re- 
liability for the different subtests was large in 
all groups. From Webb and De Haan’s data 
[12], differences of .65 and .37 between the 
highest and lowest reliability coefficients were 
computed for a “normal,” i.e., young control, 
and a paranoid schizophrenic group respective- 
ly. The present data indicated ranges of .61 
and .43 for the old control and senile psychotic 
groups respectively. In general, the larger re- 
liability coefficients were about of the same 


2 The author wishes to thank Drs. Webb and De 
Haan, who on request, forwarded their data for the 
Object Assembly subtest so that the present analysis 
would be complete. 


W echsler-Bellevue Reliabilities: Differences in Age 






227 


magnitude for all groups, but the lower coeffi- 
cients appeared to increase in magnitude with 
age and psychoses. This was also reflected in 
the direct comparisons of reliability coefficients 
between the two control groups and between 
the control and psychotic groups of similar age. 
Webb and De Haan reported that the sub- 
tests in which the paranoid schizophrenic group 
were inferior were the ones in which higher 
reliabilities were manifested. In the present 
data, performance was lower in the senile 
group than in the old control group for all 
subtests. Yet those subtest reliabilities that 
were significantly different were in all instances 
higher in the psychotic group than in the old 
control group. 

Because of the relationship between reliabili- 
ty coefficients and range of ability, these find- 
ings suggested that there were group differ- 
ences with respect to variability, and that vari- 
ability increased with age and psychoses. Hel- 
mick [6] reported that in all but one subtest, 
Webb and De Haan’s paranoid schizophrenic 
group were more variable than the young con- 
trol group, although only one difference in 
variance was significant at iess than the 5 per 
cent level. In the present data, the old contro! 
group was more variable than the young con- 
trol group in seven of the ten subtests, although 
only the Object Assembly and Picture Com- 
pletion subtests were significantly different 
with respect to variance (p < .01). The senile 
group was more variable than the old control 
group in six of the ten subtests, but three sub- 
tests were significantly different with respect 
to variance (Arithmetic and Information at p 
< .01, and Picture Arrangement at p < .05). 

In view of these differences in variability, 
the standard error of measurement appeared to 
be a more adequate index for comparison pur- 
poses. Helmick [6] found no significant dif- 
ferences in the standard errors of measure- 
ments between the young control and young 
psychotic groups and the present data indicated 
no significant differences between the old con- 
trol and senile groups. Therefore, the indica- 
tions are that in these studies there were ap- 
proximately equal errors in measurement with 
the Bellevue subtests for the different mental 
conditions within an age group. Between age 
groups, however, i.e., between young and old 
control groups, the Digit Span and Picture 














228 Jack Botwinick 


Arrangement subtests were not of equal errors 
in measurement. There was greater error of 
measurement in the young control group. 

Conclusions drawn from the comparisons of 
reliability coefficients corrected for range are 
tenuous. As Webb [11] indicated, this correc- 
tion creates a theoretical population and has 
more statistical than clinical meaning. Ac- 
cordingly, the differences between groups in 
reliability coefficients corrected for range were 
not given the consideration that the differences 
in uncorrected coefficients and standard errors 
of measurement were given. 

The present results were congruous with 
those of other reliability studies [2, 4, 5, 8, 9, 
12] on the Wechsler-Bellevue test. For pat- 
tern analysis and diagnostic considerations, 
where each subtest score is evaluated in terms 
of its deviation from the other scores, the error 
of measurement and the reliability of subtests 
must be improved before confidence in evalua- 
tion can be expected. 


Summary 


1. Three groups that were examined for 
the purposes of separate studies [1, 3, 12] 
were used. Two groups were apparently free 
of aberrant mental conditions and were ap- 
parently matched except for age and sex. The 
third group was comprised of 31 men and 
women hospitalized for senile psychosis and 
psychosis with cerebral arteriosclerosis. This 
group was matched with the older “normal” 
group and the only apparent difference was 
mental status. 

2. The general method of computing the 
reliability coefficients was odd-even item cor- 
relations corrected by the Spearman-Brown 
prophecy formula. The split-half reliability 
coeficients for each subtest were compared be- 
tween “normal” groups of different age and 
between the groups of the same age but differ- 
ent mental status. 

3. Age comparisons in reliability coeffici- 
ents indicated that a difference at p < .01 was 
found only for the Picture Completion sub- 
test. Comparisons between groups of different 
mental conditions of the same age indicated 
differences at p < .01 in reliability coefficients 
for the Arithmetic and Picture Arrangement 
subtests. When significant differences occurred, 
poorer performance scores were found for those 


subtests of higher reliability. Accordingly, 
standard errors of measurements were com- 
puted for each subtest and comparisons be- 
tween groups were made. No differences were 
found between the old “normal” and senile 
psychotic groups but the Picture Arrangement 
subtest was significantly different at p < .01 
between old and young “normal” groups. 

4. The coefficients of reliability and the 
standard errors of measurements, in general, 
were found to be too indicative of error to 
warrant confidence in making diagnostic evalu- 
ations by pattern analysis from scores on in- 
dividual subtests. 


Received October 13, 1952. 


References 


1. Botwinick, J., & Birren, J. E. The measure- 


ment of intellectual decline in the senile psy- 

choses. J. consult, Psychol., 1951, 15, 145-150. 

Derner, G. F., Aborn, M., & Canter, A. H. The 

reliability of the Wechsler-Bellevue subtests 

and scales. J. consult. Psychol., 1950, 14, 172- 

179. 

3. Fox, Charlotte, & Birren, J. E. Intellectual de- 
terioration in the aged: agreement between the 
Wechsler-Bellevue and the Babcock-Levy. J. 
consult. Psychol., 1950, 14, 305-310. 

4. Gilhooly, F. M. Wechsler-Bellevue reliability 
and the validity of certain diagnostic signs of 
the neuroses. J. consult. Psychol., 1950, 14, 82- 
87. 

5. Hamister, R. C. Test-retest reliability of the 
Wechsler-Bellevue. J. consult. Psychol., 1949, 
13, 39-44. 

6. Helmick, J. S. Reliability or variability? J. 

consult. Psychol., 1952, 16, 154—155. 

McNemar, Q. Psychological statistics. New 

York: Wiley, 1949. 

8. Rabin, A. I. Fluctuations in the mental level 
of schizophrenic patients. Psychiat. Quart., 
1944, 18, 78-91. 

9. Rabin, A. IL. Test constancy and variation in 
the mentally ill. J. gen. Psychol., 1944, 31, 
231-239. 

10. Rabin, A. L., & Guertin, W. H. Research with 
the Wechsler-Bellevue test: 1945-1950. Psy- 
chol. Bull., 1951, 48, 211-248. 

11. Webb, W. B. Corrections for variability: a 
reply. J. consult. Psychol., 1952, 16, 156. 

12. Webb, W. B., & De Haan, H. Wechsler-Belle- 
vue split-half reliabilities in normals and 
schizophrenics. J. consult. Psychol., 1951, 15, 
68-71. 

13. Wechsler, D. The measurement of adult in- 
tlligence. (3rd Ed.) Baltimore: Williams & 
* Vilkins, 1944. 


nv 


“I 





ori 


1 iii ks AB Rt, st 


2 rage ees 


alli 


6 1 alin ect a itt Wn iow 





Journal of Consulting Psychology 
Vol. 17, No. 3, 1953 


Changes in Wechsler-Bellevue Test Performance 
following Prefrontal Lobotomy 


Earl D. Markwell, Jr., William M. Wheeler 


Veterans Administration Center, Los Angeles, Calif. 


and Helen Kitzinger 


University of Southern California 


Studies of the results of using the Wechsler- 
Bellevue Intelligence Scale [9] on various 
categories of neuropsychiatric patients have 
been numerous since its publication in 1939. 
Despite the wide interest in and use of this test, 
few such studies have concerned themselves 
with the changes in the subtest scores resulting 
from the prefrontal lobotomy operation. Of 
the various articles which have reviewed re- 
search with the Wechsler-Bellevue test [6, 7, 
8], only the most recent survey article by 
Rabin and Guertin indicates such a study. In 
this study McCullough [3] tested 10 patients 
one week before and two months after the 
operation. Statistically significant gains were 
found on the digit span and picture arrange- 
ment subtests only. Rabin and Guertin, in 
their review, comment on this study that the 
demonstrated gains attributed to the inter- 
vening therapy might have been due to 
practice effects. A recent article concerning 
test-retest reliability [1] shows that these are 
the two subtests with the lowest reliability, 
and strengthens the suggestion that the ob- 
tained results may well be due to practice. 

The present study was designed to deter- 
mine whether components of the IQ shift 
following lobotomy, even in those cases where 
the IQ remains relatively constant. It was 
hoped that such a finding would throw some 
light on the psychological effects of prefrontal 
lobotomy and consequently on the functions 
of the prefrontal lobes. In terms of the 
Wechsler, it was hoped that some cues as to 
the “functions” measured by each of the sub- 


From the Veterans Administration Center, Los 
Angeles, Calif. 


229 


tests might be gained. 

Thus the study was expected to have a 
dual function—first, to clarify fronta!-lobe 
function, and second, to clarify the meaning 
of shifts in subtest scores on the Wechsler- 
Bellevue. The criticism can be made that the 
study is circular and in a very real sense it 
is. The purpose of discovering concomitant 
variation seems quite defensible, however. 


Method 


Wechsler-Bellevue, Form I, weighted sub- 
test scores obtained before lobotomy and at 
least four months after lobotomy were com- 
pared. In terms of standard statistical tech- 
niques, the question to be answered was: is 
the variation in scores obtained before lobo- 
tomy different from the variation in scores 
obtained after lobotomy. 

A group of 17 patients was tested and re- 
tested, the retest being given between 4 and 
22 months postoperatively. Fourteen of these 
patients were males who were between 23 
and 38 years of age at the time of the pre- 
lobotomy test session. All 14 were classified as 
having a schizophrenic reaction; 8 as para- 
noid, 1 as probably paranoid, 2 as hebe- 
phrenic, 1 as catatonic, 1 as mixed, and 1, 
while diagnosed as a paranoid schizophrenic 
at the time of lobotomy, had strong psycho- 
pathic elements as indicated by both previous 
and later psychiatric opinion. 

The three female patients ranged in age 
from 25 to 28 years at the time of preopera- 
tive testing, and were all diagnosed as schizo- 
phrenics: 1 as paranoid, 1 as probably para- 
noid, and 1 as an unclassified schizophrenic. 

All of the scoring of the various tests was 





Se IE pelicans 


230 W. M. Wheeler, H. Kitzinger, and E. D. Markwell, Jr. 


checked by the authors. Where the original 
scoring was questionable, recourse was had to 
the supplementary guide for administering 
and scoring the Form I scale [2]. 

Psychiatric diagnosis and all pertinent case 
history data were checked for accuracy with 
the particular entries in the patient’s general 
clinical case file. 


Results 


The means and standard deviations for the 
pre- and postlobotomy subtest scores for the 
group of 17 patients were computed. There 
were no statistically significant differences 
in scores from the two testing situations. The 
degree of similarity between pre- and post- 
lobotomy subtest scores was obtained by com- 
puting correlation coefficients for each of 
these conditions for each of the subtests and 
for the verbal, performance, and total 
weighted scores. The limits of these correla- 
tions were then computed according to the 
r-transformation technique [4]. These re- 
sults are summarized in Table 1. 


Table 1 
Means, Standard Deviations, and Correlation 
Coefficients for the Pre- and Post- 
lobotomy Conditions 
(Scores in equivalent weighted score units; N—17) 








5 & Correlation 
Standard 3 2 Coefficient 
Test Means Deviations $i Limits 
Scales Pre- Post- Pre- Post- &8 Max. Min. 





I Sz 6 6912 | 92. OR. BS ME CSS 
C 6.70 7.53 7.8 33 34 S32 327 
D 5.94 629 6.9 7.3 17 68 —46 
A 6.52 6.58 8.0 8.0 .89 .97 .64 
S 6.76 732 82 90 .79 4 .38 
V 8.05 8.11 8.9 88 .96 .99 .86 
PA 6.58 Re... 79. $98.88. 35 
PC 7.41 7.82 8.6 8.6 82 95 45 
BD 7.70 8.76 SS FS 25 MOSS 
OA 7.94 9.23 92 102 4 38 .79 
DS 5.58 6.88 66 76 .71 92 22 


Totals 
V 34.64 38.29 38.3 41.9 .84 .96 .50 
P 35.52 41.35 40.2 44.7 .90 .97  .66 


T 70.18 79.65 $81.6 86.1 .71 .92 .22 





Even though there were no significant 
differences between pre- and postlobotomy 
subtest scores, it was believed that a compari- 


son of the correlation coefficient for each of 
the subtests where the “lobotomy variable” 
had intervened with test-retest reliability coef- 
ficients, where no lobotomy had intervened, 
would throw some light on the effects of the 
operation. 

Two of the subtests, the digit span and the 
block design, were further analyzed to de- 
termine intra-subtest differences. 

The means for the pre- and postlobotomy 
digit span subtest scores for “forward” and 
“backward” were computed. No differences 
which were statistically significant were dis- 
covered. For the block design subtest the 
results for Designs IV and VI, commonly be- 
lieved clinically to reveal organic involvement, 
were separated to discover any differences be- 
tween the pre- and postlobotomy conditions. 
Counter to expectation, no increase in the ex- 
tent of organic involvement occurred follow- 
ing the lobotomy process. The only evidence 
obtained was that there was a slight but non- 
significant gain in the ability to do the easier 
of these block designs (IV). 

The deterioration coefficients for the pre- 
and postlobotomy conditions were computed 
by using the respective subtest means for each 


Table 2 
Comparison Table of Pre- and Postlobotomy 
Correlation Coefficients and Reliability 
Test-retest Coefficients 











Present Rabin Hamister 
Test Study* Study* Study® 
Scales (N=17) (N=30) (N34) 
I 85 89 .94 
c .74 12 .78 
D 17 62 .63 
A 89 75 87 
s .79 38 84 
Vv Weer” =i .90 
PA 82 54 -78 
PC 82 32 68 
BD 85 71 .67 
OA .94 31 62 
DS 71 34 .79 
Totals 
V .84 .73 91 
P .90 52 80 
T 71 55 84 





Note.—Rabin and Hamister studies from Derner, 
Aborn, and Canter [1]. 


*Test-retest intervals: Present study, 4-22 mos.; Ra- 
bin study, 1-86 mos.; Hamister study, 4-1 mo. 





4 
i 
: 


te ME Abate 








ns 





whew 





Wechsler-Bellevue Changes Following Lobotomy 231 


of these groups of scores. It was found that 
for the prelobotomy mean subtest scores there 
was an 18.77 per cent loss as compared to a 
postlobotomy loss of 16.83 per cent. This 
difference was again nonsignificant. Stated 
another way, the results show that there was 
a small, nonsignificant reduction in the value 
of the deterioration coefficient following lobo- 
tomy for the 17 patients tested. 

Table 2 summarizes the coefficients for the 
two test-retest samples and for our data. It 
is evident immediately that the three samples 
are quite similar. 

The most striking dissimilarities occur in 
the cases of the digit span subtest and the ob- 
ject assembly subtest. No explanation of the 
latter seems feasible unless we interpret it to 
mean that the reported reliabilities are too 
low for our population. 

In the case of digit span, however, it is pos- 
sible to hypothesize that the lobotomy opera- 
tion in some way disturbs this type of func- 
tion. Remembering that no significant differ- 
ence between pre- and postlobotomy means 
was discovered (5.94 and 6.29 respectively), 
it is mot possible to conclude that there is a 
systematic increase or decrease in some psycho- 
logical function measured by the digit span 
subtest, but rather that this area of function- 
ing is more affected than cthers by the oper- 
ation in some nonsystematic fashion. 

The comprehension and arithmetic subtests 
for the two conditions were then examined to 
see if any change had occurred as a result of 
the lobotomy in the number of items answered 
for each of these subtests, regardless of the 
quality or score received on the answer to the 
particular item. No statistically significant 
changes occurred in the patient’s ability to 
answer the various items composing the subtest. 

In an article published in 1941, Rabin [5] 
reported an index which he found aided the 
separation of schizophrenic patients from non- 
psychotic patients. This index showed no 
change in the extent of the patients’ schizo- 
phrenic reaction pattern following lobotomy. 

The 17 patients used in all of the statisti- 
cal computations above were drawn from a 
finite population of 55 patients who have had 
the lobotomy operation. One of the most 
striking effects of the lobotomy was the in- 
crease in the cases who could be tested. For 


the total of 55 patients, the empirical evi- 
dence indicates that the lobotomy operation 
did increase the number of testable patients by 
20 per cent, although for 5.4 per cent of 
them only a partial improvement in “testabil- 
ity” occurred, i.e., they gave answers to some 
but not all of the subtests. 


Summary 


Pre- and postlobotomy Wechsler scores of 
17 psychotic patients were compared. While 
a general tendency to improve was noticed, 
there were few significant changes. Changes in 
two subtests (digit span and object assembly) 
showed a dissimilarity with test-retest changes, 
as indicated by correlation coefficients. Digit 
span for the lobotomy group was more vari- 
able, and object assembly less variable than in 
the “control” condition. Thus the psycholog- 
ical function measured by digit span appears 
to be the most disturbed by the operation. 
Several comparisons were made (e.g., deterior- 
ation coefficient, schizophrenic index), but the 
most striking change was in the increase of 
testable patients by 20 per cent after lobotomy. 


Received October 3, 1952. 


References 


1. Derner, G. F., Aborn, M., & Canter, A. H. The 
reliability of the Wechsler-Bellevue subtests and 
scales. J. consult. Psychol., 1950, 14, 172-179. 

2. Kitzinger, Helen, & Blumberg, E. Supplementary 
guide for administering and scoring the Wechs- 
ler-Bellevue Intelligence Scale, Form I. Psychol. 
Monog., 1951; 65, No. 2 (Whole No. 319). 

3. McCullough, M. W. Wechsler-Bellevue change 
following prefrontal lobotomy. J. clin. Psychol., 
1950, 16, 270-273. 

4. McNemar, Q. Psychological statistics. New 
York: Wiley, 1949. 

5. Rabin, A. I. Test score patterns in schizophren- 
ia and nonpsychotic states. J. Psychol., 1941, 12, 
91-100. 

6. Rabin, A. I. The use of the Wechsler-Belle- 
vue scales with normal and abnormal persons. 
Psychol. Bull., 1945, 42, 410-422. 

7. Rabin, A. L, & Guertin, W. H. Research with 
the Wechsler-Bellevue test: 1945-1950. Psychol. 
Bull., 1951, 48, 211-243. 

8. Watson, R. I. The use of the Wechsler-Bellevue 
scales: a supplement. Psychol. Bull., 1946, 43, 
61-68. 

9. Wechsler, D. The measurement of adult intelli- 
gence. (3rd Ed.) Baltimore: Williams & Wil- 
kins, 1944. 








ae a ere 


teed ZR 





WV 


uty 


mB ol 


BOOKS 


ESTS 


DYE 


Books 


Abramson, Harold A. Problems of consciousness. 
New York: Josiah Macy, Jr. Foundation, 1952. 
Pp. 10 + 146. $3.25. 


This small book of 156 pages embodies the trans- 
actions of the third of the conferences on problems 
of consciousness held by the Josiah Macy, Jr. 
Foundation. The report is a stenographic transcript 
of prepared papers and the comments and discus- 
sion of the 15 participants, The interchange is not 
only spontaneous and interesting, but frequently re- 
vealing, and is a beautiful example of the value of 
verbatim recording. Editing was presumably mini- 
mal. The papers were on consciousness and brain 
metabolism by Seymour Kety, on hypnotic phe- 
nomena by Lewis Wolberg, and on experimental 
work on sleep and other variations of consciousness 
by Marcel Monnier. Participants included represen- 
tatives of various medical and biological disciplines, 
psychologists, sociologists, and anthropologists. The 
difficulties the group experienced in attempting to 
reach a commonly accepted definition of conscious- 
ness are an eloquent example of similar difficulties 
in many communications of psychologists.—A. R. 


Burlingham, Dorothy. Twins: a study of three 
pairs of identical twins. New York: International 
Universities Press, 1953. Pp. x + 92 +30 charts. 
$7.50. 


This report contains 30 developmental charts with 
mainly qualitative notations in various areas of be- 
havior. Little discussion is given to the methodo- 
logical problems involved in such a study. The 
greatest emphasis is placed upon the twins’ relation- 
ships to each other and to the social world around 
them. Attention is called to the active and passive 
roles that members of the pair assume. The phan- 
tasy of having twins is compared with the reality 
of twinships. Acquired differences are shown to be 
related to various factors. Identification is indicated 
as a process which keeps the twins identical in spite 
of their acquired differences—F. McK. 





Note.—The reviews were prepared by the Editor 
and the Associate Editors, who may be identified by 
their initials. 


232 


Davidson,Audrey, & Fay, Judith. Phantasy in child- 
hood. New York: Philosophical Library, 1953. 
Pp. viii + 188. $4.75. 


This book discusses the meaning of childhood 
phantasy from a determinedly authoritarian Freudi- 
an point of view. At times the interpretations seem 
to transcend any reasonable limits of valid logical 
inference, and it is definitely not a book for the ex- 
perimentally minded. The case material is abund- 
ant and well chosen for interest, however, and if 
the reader is displeased by the rigidly analytic in- 
terpretations placed upon the phantasy material by 
the authors, he may substitute his own. The theo- 
retical context is that of Melanie Klein and Susan 
Isaacs, and the authors’ approach is expository 
rather than critical or creative. The volume is well 
written, presented in an attractive format, and offers 
pleasant reading. —W. A. H. 


Garrett, James F. (Ed.) Psychological aspects of 
physical disability. Federal Security Agency, Of- 
fice of Vocational Rehabilitation, Rehabilitation 
Service Series No. 210. Washington: U. S. Gov- 
ernment Printing Office, 1952. Pp. vii + 195. 
45¢. 


Psychological Aspects of Physical Disability gives 
a clear description of the psychological phases of 
the various types of handicaps. Although 18 experts 
have written chapters in their special fields, the con- 
tributions are so well blended and edited that one 
has a feeling of a well-integrated whole. There is 
an excellent chapter on psychiatric aspects of physi- 
cal disability, and another on the social psychology 
of adjustment to physical disability. While the book 
is planned primarily for vocational rehabilitation 
counselors who are face to face with adult problems, 
it contains so many practical suggestions that it 
would be a valuable addition to the library of any 
psychologist. The book emphasizes the need of 
reaching the individual, inspiring him to rehabili- 
tate himself.—B. M. L. 


Garrett, Henry E. Statistics in psychology and ed- 
ucation. (4th Ed.) New York: Longmans, Green, 
1953. Pp. xii + 460. $5.00. 


This old reliable text has had only a minor face 
lifting. There are some added materials on analy- 






































a 


athe 


Pe ain cork? os nena... Se Oe een 











+g NN le os 


er 











sis of variance, new problems, several rearrange- 
ments, and a new, attractive format—L. F. S. 


Harrower, Molly. Appraising personality. New 
York: Norton, 1952. Pp. xvii + 197. $4.00. 


This attractive and highly literate presentation of 
psychological techniques to persons in allied profes- 
sions is cast in the form of a dialogue between a 
psychologist and his physician friend. Gently, with 
case studies and other simple illustrations, the physi- 
cian is led to the psychologist’s use of the Rorschach, 
the Wechsler-Bellevue, figure drawings, incomplete 
sentences, and the Szondi. A final section shows the 
integration of psychological methods in solving five 
dificult diagnostic problems. The book’s shortcom- 
ing is that it is almost too good. Persuasive exposi- 
tion, insufficiently tempered with critical appraisal, 
can oversell techniques whose real validity is in 
doubt.—L, F. S. 


Kluckhohn, Clyde, Murray, Henry A., & Schneider, 
David M. Personality: In nature, society, and 
culture. (2nd Ed.) New York: Knopf, 1953. Pp. 
xxv + 701 + xv. Text edition, $5.75. 


In the revision of this useful sourcebook, the in- 
troductory essay on personality by Murray and 
Kluckhohn has been completely rewritten and con- 
siderably expanded. The essay, developed from the 
formula “tension—reduction of tension,” is itself 
a 47-page miniature treatise. The readings include 
32 of the 39 selections which were in the first edi- 
tion, together with 13 new ones, two of which are 
on the uses of literature for psychology.—L. F. §. 


Moustakas, Clark E. Children in play therapy. New 
York: McGraw-Hill, 1953. Pp. ix + 218. $4.50. 


Describing the play therapy program with pre- 
school children at the Merrill-Palmer School of 
Detroit, the author makes a great point of the differ- 
ence between nondirective and client-centered thera- 
py on the one hand, and the “child-centered” thera- 
py discussed in this book, on the cther. The dif- 
ference is hard to find in the great amount of 
verbatim recording presented. He criticizes reflec- 
tion of feelings as a “repetitious, unsympathetic, 
static response.” His case illustrations are often ex- 
actly that. There are brief discussions of preventive 
play therapy, play therapy with normal children, 
situational play therapy (adapted from the author’s 
article in the Journal of Consulting Psychology, 
June 1951), and play therapy with disturbed child- 
ren. About half the volume is devoted to the treat- 
ment of one preschool child and her parents. In 
this case, the mother’s presence in the play therapy 
room, brought about by the child’s refusal to leave 
the mother, and the insight the mother thus acci- 
dentally gained, were instrumental in the improve- 
ment of behavior and attitudes that followed. This 
book may be a “key to understanding normal and 


New Books and Tests 








233 


disturbed emotions” for some laymen. Psychologists 
will find it superficial —M. K. 


Sarason, Seymour B. Psychological problems in men- 
tal deficiency. (2nd Ed.) New York: Harper, 
1953. Pp. x + 402. $5.00. 


Sarason’s excellent book on mental deficiency has 
been extended rather than revised. Its usefulness 
has been increased by three added chapters on 
practical problems: the interpretation of mental de- 
ficiency to parents, criteria and procedures for in- 
stitutionalization, and problems of professional 
training. —L. F. S. 


Skinner, B. F. Science and human behavior. New 
York: Macmillan, 1953. Pp. x + 461. $4.00. 


An introductory textbook in psychology which 
contains not a single table, figure, or explicit ac- 
count of an experiment is indeed a rarity these days, 
especially when the author is an outstanding ex- 
perimentalist. Skimmer seems to take the experi- 
mental basis of psychology for granted, although 
he pays considerable attention to its broader founda- 
tions as science. The most evident aspect of the 
author’s position is his distaste for intervening vari- 
ables. “Drive,” for example, is not a stimulus, a 
physiological state, a psychic state, or a state of 
strength but “. . . a convenient way of referring to 
the effects of deprivation and satiation and of other 
operations which alter the probability of be- 
havior. ...” To control behavior, the drive cannot 
itself be manipulated directly, but only the condi- 
tions of deprivation and satiation. The application 
of this conceptual framework to complex problems 
such as the self, anxiety, control, religion, and psy- 
chotherapy makes interesting reading. Furthermore, 
the author knows his literature, his philosophy, and 
his Freud, and bends them to his own ends. The 
result is a stimulating and challenging exercise for 
psychologists, although not all will be convinced. 
A different evaluation of the volume must be given 
with respect to its function as an introductory text- 
book. This book knows all the answers. The student 
is therefore likely to perceive psychology as a neatly 
wrapped bundle of truths, and is deprived of the 
opportunity to see scientific method as an ongoing 
process by which successive approximations of truth 
are reached without ever coming to an end. —L.F.S. 


Tests 
Blum, Lucille H., & Fieldsteel, Nina D. Blum-Field- 
steel Developmental Charts. Ages 0-6 yrs. Set of 
2 charts ($2.50 per 25), with manual, pp. 8. 
Yonkers, N. Y.: World Book Co., 1952, 1953. 


Two well-designed cumulative graphs are used 
to plot a child’s development according to the ob- 
servations and norms of Gesell and Amatruda. In 
each chart, the observations are plotted as an age- 
calibrated scale on the ordinate, with chronological 





234 


age as the abscissa. The chart of Motor Behavior 
is concerned mainly with postural and locomotor 
development from “head erect” to “standing on each 
foot alternately with eyes closed.” The chart of 
Functional Behavior ranges from “holds objects” to 
“ties shoelaces.” The manual suggests uses of the 
charts by psychologists, pediatricians, nursery 
schools, rehabilitation classes, and others, with nor- 
mal and handicapped children.—L. F. S. 


Maslow, A. H. The S-I [Security-Insecurity] In- 
ventory. College-adult. 1 form. Untimed, (10) 
min. Inventory ($1. per 10, $7. per 100), with 
manual, pp. 10, and key. Stanford, Calif.: Stan- 
ford Univer. Press, 1952. 


Maslow’s 75-item questionnaire is the end product 
of a series of case studies and item analyses over a 
period of more than 10 years. The outcome is a 
technique far more refined than most short question- 
naires, with good reliability (about .90) and some 
clinical evidence of validity. Numerous applica- 
tions are suggested, for research, screening, student 
personnel services, and counseling. — L. F. S. 


Books Received 
Andrews, T. G. (Ed.) Méthodes de la psychologie. 


New Books and Tests 


Paris: Presses Univer. de France, 1952. 2 vols. 
Pp. vii + 882. 3.000 fr. 


Bernhardt, Karl S. Practical psychology. New York: 
McGraw-Hill, 1953. Pp. xii + 337. $3.75. 


Geldenhuys, J. Norval. The intimate life. New 
York: Philosophical Library, 1952. Pp. 96. $2.75. 


Halmos, Paul. Solitude and privacy. New York: 
Philosophical Library, 1953. Pp. xvii + 181. 
$4.75. 


Kraft, Victor. The Vienna circle. New York: Phil- 
osophical Library, 1953. Pp. xii + 209. $3.75. 


Ray, Marie Beynon. The best years of your life. 
Boston: Little, Brown, 1952. Pp. xiv + 300. 
$3.95. 


Reiss, Samuel. The universe of meaning. New 
York: Philosophical Library, 1953. Pp. xi + 227. 
$3.75. 


Runes, Dagobert D. The Soviet impact on society. 
New York: Philosophical Library, 1953. Pp. xiii 
+ 202. $3.75. 


Tead, Ordway. Character building and higher 
education. New York: Macmillan, 1953. Pp. x 
+ 129. $2.00. 

Teicher, Joseph D. Your child and his problems. 
Boston: Little, Brown, 1953. Pp. x + 302. $3.75. 























e ae 
te 


Bees of 
;RRATIONS 


- tr EDWARD PODOLSKY, M.D. 


rsity of New York Medical College 


word by 
wh University College of Medicine 


Matic exposition of human aberrational behavior. In 

finent psychologists and psychiatrists discuss all types 

r emphasis on their psychodynamics. The ma- 
sequence for easy reference, 


som: OF THE ENTRIES: 


Ecstasy, artificial Lesbianism 
Erotographomania Logorrhea 


Malingering 
Masochism 
Menstrua! anomalies 
Murderer, mind of 
Mutism 
Mysophobia 
Narcolepsy 
Necrophilia 
Negativism 
Nudism 
Nymphomania 

bia 


On 
Opium, addiction 


OVER HALF 4 Mitts ON WORDS 
$10.00 


m LIBRARY, Publishers 


New York 16, N.Y. 
shipment by prepayment 























PSYCH 





ATRY 





JOURNAL FO) 
OF INTERPERSO! 





THE STUDY 
‘AL PROCESSES 





Contents of No. 1, Vo! 


David A. Hamburg, Beatrix Hamb 
Probleras and Mechanisms in Seve 
and Margaret J. Rioch: Multiple ‘ 
of a Mental Host tal. Gardne: 
Conventional Constructs, and 1! 
chological Theory. Mervyn Schac! 
ful Techniques im the Treatment 
Borderline States. Elinor Ulman 
Clinic. David McK. Rioch and A! 
Josephine R. Hilgard: Anniversa 
by Children. William Seeman: D 
therapy. Editorial Notes. Brief 


Published in February, May, August, « 
White Psychiatric Foundation, Apt. 3, 
ington 6, D. C. 
Subscription price $8.00 for | 
Volumes I to XV, in 





The Foundation announces the pub 
pany of The Interpersonal Theory : 
van. This book, prepared from rec: 
Washington School of Psychiatr) 
his psychiatric theories. Copies n 
Alanson White Psychiatric Foun 
W., Washington 6, D.C. $5.00 





XVI, February 1953 
and Sydney deGora; Adaptive 


y Burned Patients. Jarl E. Dyrud 


apy in the Treatment Program 
zey: Hypothetical Constructs, 

of Physiological Data in Psy- 
id Stephen W. Kempster: Use- 
atients with Schizophrenia or 
t Therapy at an Outpatient 
| H. Stanton: Milieu Therapy. 
actions in Parents Precipitated 
ory and Justification in Psycho- 
munications. Reference Lists. 


ovember by the William Alanson 
Rhode Island Ave., N.W., Wash- 


oreign subscription $8.80 
im, $11.00 each. 





on by W.. W. Norton & Com- 
sychiatry by Harry Stack Sulli- 
.gs Of Sullivan’s lectures at the 
esents the latest: statement of 
- purchased from the William 
_ 1711 Rhode Island Ave., N. 
4, 





ADDRESS ALL COMMUNIC 








NS TO THE PUBLISHER 

















